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ON MULTIVARIATE QUANTILES UNDER PARTIAL ORDERS 

By Alexandre Belloni and Robert L. Winkler 

Duke University 

This paper focuses on generalizing quantiles from the ordering 
point of view. We propose the concept of partial quantiles, which are 
based on a given partial order. We establish that partial quantiles are 
equivariant under order-preserving transformations of the data, ro- 
bust to outliers, characterize the probability distribution if the partial 
order is sufficiently rich, generalize the concept of efficient frontier, 
and can measure dispersion from the partial order perspective. 

We also study several statistical aspects of partial quantiles. We 
provide estimators, associated rates of convergence, and asymptotic 
distributions that hold uniformly over a continuum of quantile in- 
dices. Furthermore, we provide procedures that can restore mono- 
tonicity properties that might have been disturbed by estimation er- 
ror, establish computational complexity bounds, and point out a con- 
centration of measure phenomenon (the latter under independence 
and the componentwise natural order). 

Finally, wc illustrate the concepts by discussing several theoreti- 
cal examples and simulations. Empirical applications to compare in- 
take nutrients within diets, to evaluate the performance of investment 
funds, and to study the impact of policies on tobacco awareness are 
also presented to illustrate the concepts and their use. 

1. Introduction. The quantiles of a univariate random variable have 
proved to be a valuable tool in statistics. They provide important notions 
of location and scale, exhibit robustness to outliers, and completely charac- 
terize the random variable. Moreover, quantiles also play a significant role 
in applications. Naturally, the quantiles of a multivariate random variable 
are also of interest, and the search for a multidimensional counterpart of the 
quantiles of a random variable has attracted considerable attention in the 
statistical literature. Various definitions have been proposed and studied. 

Barnett [3], Serfling [50] and Koenker [32] provide valuable comparisons 
and surveys of different methods. Some interesting recent work is presented 
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in Hallin, Paindaveine and Siman [24] (with discussions [25, 52, 59]), Kong 
and Mizera [34] and Serfling [51]. A substantial part of the literature focuses 
on developing relevant measures to characterize location and scale informa- 
tion of the multivariate random variable of interest. This is usually accom- 
plished by defining a suitable nested family of sets. As discussed below, our 
focus will be on a given partial order between points instead. The incorpo- 
ration of this additional information is the distinctive feature of this work. 
Therefore, our approach is different and hence complementary to previous 
work that focuses on location and scale measures. 

The fundamental difficulty in reaching agreement on a suitable general- 
ization of univariate quantiles is arguably the lack of a natural ordering in 
a multidimensional setting. Serfling [50] points out that, as a result, "vari- 
ous ad hoc quantile-type multivariate methods have been formulated, some 
vector-valued in character, some univariate, and the term "quantile" has ac- 
quired rather loose usage" (page 214). The simplest notion of a multivariate 
quantile is that of a vector of the corresponding univariate quantiles, but this 
fails to reflect any multivariate features of the random vector. More often 
than not, attempts to take into account such multivariate features have been 
influenced by the justifiable temptation to exploit some geometric structure 
of the underlying space. For example, many approaches are based on the 
use of specific metrics to collapse the multivariate setting into a univariate 
measure. Many definitions of multivariate quantiles that use notions such 
as the distance from a central measure, norm minimization, or gradients 
immediately make the values relevant. In contrast, for univariate quantiles 
only the ordering matters, and the actual values of the variable away from 
the quantile of interest are irrelevant. 

In our work, within the definition of multivariate quantiles, the crux is 
the concept of ordering, which might or not be related to geometric notions 
of the underlying space. Our starting point will be to detach our concept 
from the geometry of the random variable, and assume that a partial order 
is provided which will be used to define the partial quantiles. This allows our 
work to focus on the minimum structure for which the problem makes sense. 
With a general partial order, as opposed to a complete order, we recognize 
that some points simply cannot be compared. Our key insight is to rely on 
a family of conditional probabilities induced by the partial order to circum- 
vent the lack of comparability. Such approach yields a distinguishing feature 
of the proposed partial quantiles: the reliance on the partial order. Our anal- 
ysis is close in spirit to, but still quite different from, the important work of 
Einmahl and Mason [18], who proposed a broad class of generalized quan- 
tile processes. We defer a detailed discussion to Section 4 but we anticipate 
that our definitions do not fit within the framework of [18] and most of our 
results have no parallels in [18]. 
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Our main contributions are as follows. First, we propose a generalization 
of quantiles based on a given partial order on the space of values of the ran- 
dom variable of interest. Index, point, surface, and comparability notions of 
the partial quantiles are studied. We establish that these partial quantiles 
have several desirable features: equivariance under monotone mappings with 
respect to the chosen partial ordering (an instrumental feature of the uni- 
variate case); generalization of the efficient frontier concept; meaningfulness 
not only in high-dimensional Euclidean spaces but also in arbitrary sets (rel- 
evant for decision making, where metrics are not available); and applicability 
even to general binary relations. 

Second, we investigate statistical estimation and inference based on fi- 
nite samples. We derive results on rates of convergence that hold uniformly 
over infinitely many quantile indices. In the analysis of the estimation prob- 
lems, we have to accommodate discontinuous criterion functions, potential 
nonuniqueness of the true parameter, and a restricted identification condi- 
tion. These difficulties lead to nonstandard rates of convergence. Also, we 
derive the asymptotic distribution for the partial quantile indices process 
(indexed by a subset of the underlying space) and for the partial quantile 
comparability where non-Gaussian limits are possible due to nonuniqueness. 

Several other results are established. Partial quantile indices and proba- 
bilities of comparisons are robust to outliers and we study when they char- 
acterize the underlying probability distribution, both important properties 
of univariate quantiles. Due to sampling error, the estimated partial quantile 
points could violate the partial order, as can happen with (univariate) quan- 
tile regression [32]. In quantile regression, Chernozhukov, Fernandez- Val and 
Galichon [10, 11] based on rearrangement, Dette amd Volgushev based on 
smoothing and monotonization [14], and Neocleous and Portnoy [39] based 
on interpolation, show how to obtain monotone estimates of quantile curves. 
In the context of partial quantiles within lattice spaces, we propose a new 
procedure to correct for this estimation error that leads to partial quan- 
tile point estimates that are monotone with respect to the partial order. 
(Under the componentwise natural ordering, we build upon the use of rear- 
rangement in Chernozhukov, Fernandez- Val and Galichon [10, 11] to achieve 
an improvement on the estimation under suitable mild conditions.) Under 
independence and the componentwise natural ordering, we also point out 
a concentration of measure and a possible "curse of dimensionality" for 
comparisons. We also define dispersion measures based on partial quan- 
tile regions. Moreover, we study the computational requirements associated 
with approximating partial quantiles. We provide interesting primitive con- 
ditions under which computation can be carried out efficiently. Finally, we 
illustrate these concepts through applications to evaluate the intake of nu- 
trients within diets, the performance of investment funds, and the impact of 
different policies on tobacco awareness. 
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2. Partial quantiles. In this section, we propose a generalization of quan- 
tiles and derive basic probabilistic properties implied by the definition of 
partial quantiles. 

2.1. Definitions. Let X denote an 5-valued random variable defined on 
a probability space (Q,A,P), where S is an arbitrary set. Moreover, let 
denote a partial order defined on S (x =^ y if x precedes y). Throughout the 
paper, we assume that for all x G S, the events {X y x] and {X x} are 
,4-measurable. We begin by defining the set of points that can be compared 
with a fixed element x & S given the partial order. 

Definition 1. For any x G S, the set of points comparable with x under 
the partial order is defined as C(x) = {y G S : y x or y =4 x}. Let = 
P{X G C(x)) denote the probability of comparison of x. 

Comment 2.1. It follows that all definitions and results can be derived 
for general binary relations We focus on partial orders since these binary 
relations encompass our applications and to simplify the exposition. A binary 
relation ^ is a partial order if it is (i) reflexive (x =^ x), (ii) transitive (x =<! y 
and y =4 z implies x =4 z) and (iii) antisymmetric (x =^ y and y =^ x implies 
x = y). Unless otherwise noted, we will assume that the binary relation =^ is 
a partial order. 

The probability of comparison p x is simply the probability of drawing 
a point comparable with x. The usefulness of C(x) relies on the fact that 
conditional on the event {u G £1:X(uj) &C(x)}, which hereafter we denote 
simply by C(x), we have 

P(X y x\C{x)) + P(X ~ x\C{x)) + P(X ~< x\C{x)) = 1. 

That is, conditioning on C(x) avoids points that are incomparable with x 
making the partial order =^ "complete" with respect to x [for every y G C(x) 
either x ^ y or y =<; x\. Under this conditioning, a sensible definition for x 
being a quantile of X should involve P(X =4 X \C( X )) an d P(X >?= x\C(x)), the 
probabilities of drawing a point preceding x and succeeding x, respectively, 
under the partial order. Next, we formally define the concept of partial 
quantile surfaces and indices. 

Definition 2. For each x G 5, we define its partial quantile index as 
(2.1) t x = P{X^x\C(x)). 

Moreover, for r G (0, 1), the r-partial quantile surface is defined as 



(2.2) Q(r) = {x G S : P(X fc= x\C{x)) > (1 - t),P(X 4 x\C{x)) > r}. 
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Partial quantile indices provide an ordering notion for each element of 
S relative to its comparable points. Definition 2 also defines a subset of S 
associated with each quantile index r G (0,1). In the case of a univariate 
random variable under the natural ordering, Q(r) is simply the set of r- 
quantiles of X. Note that we can have x G Q(r) for more than one value 
of r only if P(X ~ x\C(x)) > 0. (The same would happen in the univariate 
quantile case.) 

Next, we select a meaningful representative point, called a r-partial quan- 
tile point, from each r-partial quantile surface. To do that, we use the cri- 
terion of maximizing the probability of drawing a comparable point. 

Definition 3. For r G (0, 1), a r-partial quantile point, or simply a r- 
partial quantile, is defined as any maximizer of p x over Q(r), namely, 

(2.3) x T G argmaxpz s.t. x G Q(r). 

Also, for each r G (0,1), let p T = p Xr = P(X G C(x T )) be the probability 
measure of the points comparable with any r-partial quantile x T . The set of 
all r-partial quantile points is denoted by Q*(r) = {x £ Q(r) :p x = p T }. 

The lack of a complete order in S is exploited to select a representative 
point within the partial quantile surface. This approach is detached from any 
geometric aspect of S, yet it reflects the multivariate nature of the situation 
as well as the partial order. Also, note that if we have a complete order, in 
which p x = 1 for all x G S, then any x G Q(r) is a r-partial quantile. This 
is exactly what happens in the univariate case, where multiplicity can also 
occur. 

Partial quantile points x T can also be interpreted as "approximate quan- 
tiles" in the sense that 

P(X 4x t )> Pxt -t and P(X fc= x T ) > p XT ■ (1 - r) 

and that the balance is "correct" within comparable points 

P{X 4x T \C(x T ))>T and P(X fc= x T \C(x T )) > (1 - r). 

In fact, x T is the "best approximate quantile" since it is the maximizer of 
the probability of comparisons given the restrictions. 

The probability of comparison plays an important role in our definitions 
and, consequently, in the interpretation of partial quantiles. It will allow 
us to quantify the gap between the interpretation of partial quantiles and 
the interpretation of traditional quantiles where all points are comparable 
to each other. We will focus on the following quantity that characterizes 
the overall comparability of partial quantile points uniformly over different 
quantiles. 
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Definition 4. The partial quantile comparability is the minimum proba- 
bility of comparison associated with partial quantile points, namely 

(2.4) p = min p T . 

re(o,l) 

When the comparability p is large, the interpretation of partial quantile 
points is very similar to traditional quantiles. On the other hand, if p is 
small, there are partial quantile indices for which the interpretation of partial 
quantile points deviates considerably from that for the traditional quantile 
since drawing a point that is incomparable to at least some r-partial quantile 
point is likely. Clearly, if the binary relation ^ is a complete order, like 
univariate quantiles, we have p = l. As a side note, (2.4) can be written as 
p = min Te ( 0i i) maXx£Q( T ) p x , so that p is a saddle point of the probability of 
comparison. 

2.2. Structural properties. Next, we move to structural properties im- 
plied by the definition. It is notable that interesting and useful properties 
can be derived within the general case. 

We say that a mapping h:S^S is order-preserving if x )p= y implies 
h(x) h(y) and x>~ y implies h{x) y h{y). 

Proposition 1 (Equivariance and invariance). Let h:S —> S be a order- 
preserving mapping. For an S-valued random variable X, let x x , Q x (t), 
T x i Px > Pt~ an d P X denote the partial quantile quantities. 

Then partial quantile points and surfaces are equivariant under h, namely 

x^ = h(x x ) and Q h ( x \ T ) = h{Q x (T)), 

and partial quantile indices and probability of comparisons are invariant 
under h, namely 

MX) _ x h(X) _ x h(x) _ x d h(X)_ x 

T h{x) ~ T x ' Ph(x) Px i Pt —Pt ana P — P ■ 

Proposition 1 is simple but very useful. As with univariate quantiles un- 
der the natural ordering, any order-preserving transformation of the data 
can be dealt with by transforming the partial quantiles of X. For concrete- 
ness, consider S = M. d with a )p b only if a > b componentwise. In this case, 
common examples of invariant transformations are: translation (x i— > x-\-z), 
positive scaling (x \— > tx, where t > 0), and componentwise monotonic trans- 
formation [e.g., Xj I—?- ln(xj), where Xj > 0]. Note that no assumption on the 
probability distribution was made in Proposition 1. 

In order to show symmetry, we also require assumptions on the probability 
distribution. 
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Proposition 2 (Symmetry). Assume that the probability distribution of 
X is invariant over a order-preserving mapping m:S>-^-S, that is, P(A) = 
P(m(A)) for every measurable Ac S. Then if x T is a partial quantile point, 
m(x T ) is also a partial quantile point; if z G Q(t), then m(z) G Q(t); and 

T x = T~m(x) ■ 

The next lemma shows that transitivity in the partial order is automati- 
cally transferred to the partial quantile indices. 

Proposition 3 (Transitivity). Assume that the binary relation =4 is 
transitive. Then we have that x x' implies that t x > t x > . 



3. Estimation of partial quantiles. Up to now, we have studied proper- 
ties of the partial quantiles when the probability distribution of the ran- 
dom variable of interest is known. Next, we focus on exploring sample- 
based partial quantiles viewed as estimates of their population counter- 
parts. Following standard notation in the empirical process literature, we 
let F n (A) = i Er=i Hxi G A}. Also, we let F n (A\B) = F n (A n B)/F n (B) if 
F n (B) > and zero otherwise. We carry out all of the asymptotic analysis 
asn-y oo. We use the notation a < b to denote that a = 0(b), that is, a< cb 
for all sufficiently large n, for some constant c > that does not depend 
on n, and we use a <p b to denote that a = Op(b). We also use the notation 
a V b = max{a, b} and a A b = min{a, b}. 

3.1. Assumptions. We base our analysis in this and the next section on 
high-level conditions E.1-E.6. These high-level conditions are implied by 
a variety of more primitive conditions as discussed below. 



E.l. Random sampling. The data Xi, i = 1, . . . ,n, are an i.i.d. sequence 
of 5-valued random variables. 



The next condition imposes regularity on the family of sets induced by 
the partial relation 



(3.1) 



T={C(x),{y G S :y ^ x},{y £ S :y ^ x} :x £ S}. 



E.2. Regular partial order. For p£ (0, 1), there is a positive number v(p) 
such that 



sup 

xGS,p x >p 



\(Xi 4 x) - P(X 4 x) 



Pi 



V 



F n (Xi )?x) -P(X)?x) 



V 



Px-Px 



Px 



< P y/v(p)/n. 
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Condition E.2 ensures that the partial order is well-behaved for a uniform 
law of large numbers to hold over the sets {X =4 x}, {X fc= x}, and C(x) for 
all x in 

(3.2) Cp = {xeS: Px >p}, 

that is, over points with a minimum requirement on the probability of com- 
parison. Condition E.2 is implied by several more primitive conditions on T 
[e.g., if T is a Vapnik-Cernonenkis class with VC index v(T) < 00 and mild 
measurability conditions]. We refer to Alexander [1], Pollard [43] and Gine 
and Koltchinskii [22] for several results on deriving bounds for v(p) under 
primitive assumptions. A technical remark is that we require the normaliza- 
tion factor to be p x for all three terms, which is considerably weaker than 
using P(X =4 x) and P(X fc= x). 

Alternatively, we could derive all of our results under the condition 

(3.3) sup |P n (A) - P(A)\ < P ^v{T)/n. 
AeT 

However, (3.3) might not lead to results as sharp as E.2 achieves when 
p x is small. We refer to Dudley [16] and van der Vaart and Wellner [57] 
for a complete treatment to derive bounds on v(T) leading to (3.3). Note 
that if condition (3.3) holds, then condition E.2 is satisfied with v(p) = 
v(T)/p 2 . It is convenient to keep in mind the case < p < p/2, for which 
all partial quantile points x T are contained in Cp and therefore covered by 
condition E.2. 

Next, we consider conditions the following identification and regularity 
conditions relating probability of comparisons and a metric d(-,-) for S. 

E.3. Identification condition. There are positive constants c and a > 1 
such that for every x E Q(t), we have 

Pr~Px>cA inf d(x T ,x) a . 

E.4. Continuity of partial quantile points. For a compact set of quantile 
indices U C (0, 1), let r£W and let r' be in a neighborhood of r. For every 
x T G Q*(t), there exists x T > G Q*(r') such that: 

(i) \p T -jv| < |r - r'| 7 and (ii) d(x T ,x T >) < |r - r'|. 

E.5. Empirical error of probability of comparisons. We have that 
sup sup sup \p XT -p Xr - (p y -p y )\ <p 4> n (r)/Vn, 

t£U x r eQ*(r) y&S,d(x T ,y)<r 

where 4> n :M+ — > M+ is such that r 1— > 4> n ( r ) is nondecreasing and concave, 
and r 1— >• ^ n { r )/ rK is decreasing for some k < a. 
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Condition E.3 is a restricted identification condition, that is, x T is a max- 
imizer of the probability of comparison only over Q(r). Moreover, it al- 
lows for partially identified models in the spirit of Chernozhukov, Hong and 
Tamer [12]. Condition E.4 requires that the set- valued mapping ri— > Q*(j) 
of partial quantile points is a continuous correspondence over U. However, it 
does not restrict Q*(r) to be a singleton, convex, or even bounded. Condi- 
tion E.5 is a standard condition on the criterion function for deriving rates 
of convergence of M-estimators (see, e.g., van der Vaart and Wellner [57], 
Theorem 3.2.5). Bounds for (fi n are available in the literature for a variety 
of classes of functions (see van der Vaart and Wellner [57]). 

Finally, in order to establish functional central limit theorems, the fol- 
lowing mild assumption is is imposed on the class of sets T as defined 
in (3.1). 

E.6. Gaussian process in T ■ For each n > 1, the process indexed by T 

a n (A) = y/E(¥ n (A)-P(A)), AeT, 

converges weakly in £°°(T) to a bounded, mean zero Gaussian process Zp, 
indexed by T with covariance function P{A n B) — P(A)P(B) for A, 
B€T. 

Condition E.6 is directly satisfied if the class of sets T satisfies uni- 
form entropy or bracketing conditions and mild measurability conditions 
(see [57]). 

Next, we verify these conditions for our main motivational examples. 

Example 1 (Convex cone partial order). Let X be an Revalued random 
variable with a bounded and differentiable probability density function. Con- 
sider the partial order given by a ^= b only if a — b E K, where K is a proper 
convex cone (nonempty interior, and does not contain a line). In this case, 
we have P{X )? x) = P{x + K) and P(X 4x) = P(x - K). 

Lemma 1. Consider the convex cone partial order setup with a com- 
pact set U C (0,1), and X be an M. d -valued random variable bounded and 
differentiable probability density function. Then, under i.i.d. sampling of 
X (condition E.lJ, we have that E.2 with v(p) < d/p 2 , E.5 with <p n ( r ) ^ 
(r 1 / 2 + n~ 1 / 4 )vTogn and d(x,y) = \\x — y\\ and E.6 hold. Assume further 
that X has convex support and the probability density function is strictly 
positive in the interior of the support. Then E.3 holds with d(x, y) = \\x — y\\ 
and a = 2, E.4(i) holds with 7=1, and the mapping r \-± Q*(-) is upper 
semi- continuous. 
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Example 2 (Acyclic directed graph partial order). Let X be an S-va- 
lued random variable where \S\ < oo. The partial order is described by an 
acyclic directed graph, that is, x y if there is a directed path from x to y 
in the graph. 

Lemma 2. Consider a space S, with \S\ < 00, a partial order defined over 
S by an acyclic directed graph, and let X be an S -valued random variable. 
Then, under i.i.d. sampling of X (condition E.lJ, we have that E.2 with 
v(p) < (log |5|)/p 2 . Moreover, for d(x,y) = l{x 7^ y}, we have that E.3 with 
any a > 0, E.5 with <f> n {r) < l{r > 0}ylog|5| and E.6 hold. Moreover, E.4 
holds with any 7 > if the compact set li does not contains a particular 
finite set of indices. 

In Section 5, we discuss other examples where conditions E.1-E.6 hold. 

3.2. Rate for partial quantile indices. We start by considering the es- 
timation of the partial quantile indices t x associated with each x € S, as 
defined in (2.1). In order to estimate this parameter, we define the estima- 
tor 

(3.4) % = ¥ n (Xi 4 x\C{x)) for each x £ S. 

A fundamental departure from the univariate case arises from the lack of 
comparability between some points. This will oblige us to restrict the set on 
which uniform convergence is achieved. The next result establishes that the 
convergence of partial quantile indices is uniform over Cp, which from (3.2) 
is the set of points for which the probability of drawing a comparable point 
is at least p. 

Theorem 1 (Uniform rate for partial quantile indices). Assume that 
conditions E.l and E.2 hold. Then for anypG (0,1), we have 

SUp \? x - T x \ <p y/v(p)/n. 

xeS,p x >p 

The convergence is uniform over the set Cp under the condition that v(p) = 
o(n), which allows for v(p) to grow, that is, for p to diminish, as a function 
of the sample size. That is of interest to achieve convergence in the whole 
space as n grows, and for increasing-dimension frameworks as proposed by 
Huber [27]. Theorem 1 allows for the estimation of extreme partial quantile 
indices as long as they have a reasonable probability of comparison. 

This result highlights the difficulty of estimating properly the quantile t x 
of points for which comparable points are rare. Intuitively, if p x < 1 jn there 
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is a nonnegligible probability that our sample might miss C{x) completely, 
since 

P(XitC(x),i = l,...,n) = (l- Px ) n > (l-^) >^ 

which creates ambiguity regarding the choice of the partial quantile index 
of x. 

Within Cp, the estimation of the probability of comparison p x holds uni- 
formly directly from E.2. However, it is typical for this to hold uniformly 
over S in many cases of interest. 

For r G (0, 1), the natural sample analog of partial quantile surfaces (2.2) 
is given by 

(3.5) Q(r) = {x ES:F n (Xi ^ x\C(x)) > (1 - r),P n (J*Q ^ x\C{x)) > r}. 

From Theorem 1 it follows that if x G Q(t) and p x > p, x G Q(t'), where 
\t -r'\ < P yjv(p)/n. 

3.3. Rate for partial quantile points. Next, we turn to the estimation of 
partial quantile points. We are also interested in deriving rates uniformly 
over a set of quantile indices. We will consider uniform estimation over 
a compact set U C (0,1). Note that, by definition, for any r £W we have 
Pt > P- Intuitively, this ensures that observations are likely to be on the 
comparable set of partial quantile points as long as p is not too small. We 
consider the following estimator: 

x T G argmaxp x . 

(3.6) s.t.F n {X^x)>(l-T)-p x -e n , 

where e n is a slack parameter that goes to zero (see Comment 3.1 below). 
We denote the optimal value in (3.6) by 

Pt=Px t =IPn(C(x T )). 



Comment 3.1. The introduction of e n aims to ensure that the feasible 
set in (3.6) is nonempty uniformly over r GU with high probability. It suf- 
fices to choose e n to bound discontinuities of functions in T associated with 
partial quantile points, namely 

e^:=2sup sup limsup |P n (Xj ^ x) — P n (Xj ^ x T )\ 

V \F n (Xi )px)- P n {Xi V s T )| V \p x -p XT \. 
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In the convex cone partial order described in Example 1, if X is an K - 
valued random variable with no point mass, with probability one it follows 
that e° <2d/n. In the case of discrete spaces like Example 2, we can take 
e n = for n sufficiently large. In more general cases, it also suffices to choose 
e n to majorize '■= sup x( z Spx>p \T x — t x \. Under E.l and E.2, Theorem 1 
ensures that e^' <p \Jv{p)jn. The latter simplifies the analysis considerably 
and does not affect the final rate of convergence of the estimator, but could 
introduce a ^/n-h\as in the partial quantile index of the estimator of the 
partial quantile point (see Theorem 2 and Corollary 2 below). We explicitly 
allow for either choice in Theorem 2 since it automatically leads to practical 
choices of e n in cases of interest, including Example 1. 

In contrast to the estimation of partial quantile indices, where the con- 
vergence is independent of the underlying space S, the estimation in (3.6) 
brings forth the need to work with a metric to measure the distance in S 
between the estimated and true parameters. It must be noted that the choice 
of metric might be application dependent. A possible choice of metric that 
relies completely on the partial order to avoid the geometry of S is given 
by 

(3.7) d (w, z) = P({X ^w}A{X^ z}) + P({X 4 w} A {X 4 z}), 

where A AB = (Af]B c ) U (B(~) A c ) denotes the symmetric difference between 
two sets. A typical choice of metric in many applications when S = M. d , which 
is connected to the geometry, is given by the ^-norm d(w,z) = \\w — z\\. 
Moreover, some identification condition with respect to the particular met- 
ric needs to hold, in our case E.3. 

In the analysis of the rate of convergence, one needs to account for non- 
standard issues: the underlying parameter might not be unique, the empir- 
ical criterion function lacks continuity, a restricted identification condition, 
and the constraints in (3.6) define a random set. For instance, the lack of 
continuity of indicator functions will lead to 4> n ( r ) ^ (r 1 / 2 + n _1//4 )y / logn in 
many cases of interest and would not allow for the usual y/n-r&te in gen- 
eral. Examples of nonstandard rates of convergence are given in Kim and 
Pollard [31] and van der Vaart and Wellner [57]. Moreover, for each quan- 
tile r G (0, 1), the identification condition holds only within Q(r) instead of 
over the entire space S. That can lead to a slower rate of convergence since 
the partial quantile surface <2(r) is unknown and needs to be replaced by 
a parameter set that is random. 

Theorem 2 (Uniform rate for partial quantile points). Consider a com- 
pact set of quantiles U C (0, 1) and let e n > A e% ■ Assume that con- 
ditions E.1-E.5 hold for U and some metric d(-,-). Then, provided that 
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v(p) = o(np 2 ), we have 

/v(a) e 2 \ l/2A 7 /(2a) 

sup inf d(x T ,x T )< P + Vr"\ 

reuxr eQ*(r) \ n pV 

<0„(l/r rt ) < x/i. 



In the typical case of $>n(r) < (r 1 / 2 + n _1//4 )-v/logn, if 7/a = 1/2, we have 
an n 1//4 -rate of convergence, and if 7/a = 1 we have an (n/logn) 1//3 -rate 
of convergence. Under mild regularity conditions, the logarithmic term can 
be removed in the later case if we are interested on a single quantile index 
recovering a n 1 / 3 -rate of convergence, as in [31]. However, it is instructive 
to revisit Theorem 2 in the case of a complete order, for which it turns out 
that Theorem 2 implies a -y/n-rate of convergence. 

Corollary 1. Under E.l, E.2 and E.4(ii), if the binary relation is 
a complete ordering, for a compact setlA C (0, 1) and e n := e^Ae^ , we have 

sup inf d(x T ,x T ) < P y/v{l)/n. 

The presence of a complete order resolves the issues with the restricted 
identification condition and discontinuity of the criterion function since the 
criterion function becomes constant, namely p x =p x = 1 for all x G S. Also, 
in this case, the multiplicity of partial quantiles is the same multiplicity as 
in the univariate quantile under the natural ordering, Q*(t) = Q(t). 

Finally, we note that in discrete spaces S with |<S| < 00, like Example 2, 
for n sufficiently large, with high probability we perfectly recover the partial 
quantile points associated with most indices [a consequence of Lemma 2 and 
the metric d(x,y) = \{x 7^ y}]. 

3.4. Asymptotic distributions. In this section, we discuss the derivation 
of asymptotic distributions of quantities defined in this paper. 



Theorem 3 (Asymptotic distribution of partial quantile indices). Let 
p > be fixed, and assume that conditions E.l, E.2 and E.6 hold. Then, if 
v(p) = o(n), for any x £Cp 

^{%-t x )^nU Tx{1 ~ Tx) \ 

Moreover, the process (3 n (x) = y/n(r x — t x ) indexed by Cp converges weakly 
in £°°(Cp) to a bounded, mean zero Gaussian process Gp indexed by Cp with 
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covariance function given by 



_ ( P(X4znX4y) , P(C(z)nC(y)) 
Uz ' y ~ TzTy \P(X4z)P(X4y) + P Z Py 



P(C(z)nX4y) P(X4znC(y)) \ 

Pz p(x 4 y) P{x ^ z) Py J 



for any z,y<EC p . 



Theorem 3 characterizes the empirical process associated with the esti- 
mation of partial quantile indices. Moreover, it allows us to make inference 
on the unknown partial quantile index associated with the estimated partial 
quantile point process. 

Corollary 2. Assume that the conditions of Theorem 2 and E.6 hold. 
Then, uniformly over r ElA, we have 

Vn(Tx T ~r) = G P (x T ) + op(l) + \/n(% T - r), 

where y/n(^$ — r) is observed. 

We note that the quantity \/w(% T — r) is observed in the estimation, so 
Corollary 2 can be used for inference. In particular, if P(X ^= x), P(X =4 %) 
and p x are continuous in x, we have V^^W ~ T \ = Op{e n ^fnjp). In that 
case, if e n = o(p/y / n), it establishes that the partial quantile index of the 
estimated partial quantile point is -y/n-consistent. 

Finally, we turn to the estimation of the partial quantile comparability 
that aims to characterize the overall comparability of points. We consider 
the estimator given by 

(3.8) p = minp T , 

where U C (0,1) is a compact set sufficiently large. The next result studies 
the property of the estimator. It is interesting to note that one can estimate 
this quantity at a y/n-r&te under mild regularity conditions. 
We use the following notation. For r £ (0, 1), let 

Z p (t) = sup Z P (C(x T )), 

x T 6Q*(r) 

where Zp is a Gaussian process defined as in E.6. 

Theorem 4 (Asymptotic distribution of partial quantile comparabil- 
ity). Consider a compact set of quantiles U C (0,1), let e n > A , 
e^ = o(ra -1 / 2 ), and assume v(p) = o(np 2 ) and that E.1-E.6 hold. Assume 
that the function r i— >■ p T is twice continuously differentiable with a unique 
minimum, that is, p = p T * for a unique t* € int IA. Then 

v^(p-p) = op(l) + Z P (r*). 
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Theorem 4 shows that we have a Gaussian limit for y/n(p — p) only if the 
set Q*(t*) is single- valued. Otherwise we should expect non-Gaussian limits. 
Similar findings of non-Gaussian limits within generalizations of quantiles 
have been found in [18]; see Section 4 for a detailed discussion. 

4. Additional issues. In this section, we discuss several other relevant 
issues. First, we discuss robustness to outliers. Next, we study monotonicity 
properties of the underlying partial quantiles and their sample counterparts. 
We provide conditions under which partial quantile indices and probabilities 
of comparison characterize completely the underlying probability distribu- 
tion. Then we establish that under independence and (M. d , >), there is a con- 
centration of measure for partial quantile indices and points. We also develop 
dispersion measures based on partial quantiles. Computational tractability 
of computing partial quantiles of a random variable with known probability 
distribution is then considered. Finally, we have a detailed comparison with 
the generalized quantile processes developed in [18]. 

4.1. Robustness to outliers. Next, we investigate robustness to outlier 
properties of partial quantile indices and probabilities of comparison. To do 
that, we consider the influence function of these functions. Let F denote the 
distribution of X and F £ denote a contaminated distribution by j/G5, 

F £ = e5 y + (1 - e)F. 

Viewing the quantities as functions of the probability distribution, we have 
t x {F) = t x and p x (F) = p x . Thus, t x (F £ ) and p x {F £ ) are the partial quantile 
index and probability of comparison associated with x for the contaminated 
distribution. Recall that the influence function of a function #(•) at F and y 
is defined as 

The following result follow (whose proof follows from direct calculation). 

Lemma 3 (Influence functions). The influence function for partial quan- 
tile indices and probabilities of comparisons are given by 

tt7 ( tp\ Hy^^}-T x l{y^xLly^x} 

Px 

and 

IF p x (y, F ) = l{y ^xUy^x} -p x . 

As in the case of univariate quantiles, the influence functions do not de- 
pend on the exact "place" of y. They only depend on whether y precedes 
x, y is incomparable to x, or x precedes y. Thus, an outlier cannot impact 
probabilities of comparison much nor partial quantile indices if p x is far from 
zero. 
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Note that partial quantile points are defined based on p x and t x . Nonethe- 
less, the influential function associated with partial quantile points is not de- 
fined in the generality of the paper. In particular, we cannot take differences 
between elements of S unless additional structure is imposed. One could 
generalize the influence function to lim^o d{x T (F) , x T {F £ )) je for some met- 
ric d defined in S. However, extending the notion of the influence function 
is outside the scope of this work. 

4.2. Characterization properties. One important question is whether the 
partial quantile quantities characterize the underlying probability distribu- 
tion, as univariate quantiles do in the univariate case. The answer relies on 
the richness of the partial order. 

A family of sets £ is said to be a determining class if for any two probabil- 
ities measures /i, v such that fJ>(E) = v{E) for all E G £, we have [i = v. Ref- 
erence [17] contains properties and definitions of determining classes which 
is a well studied topic in probability theory [2, 54, 55]. The classic example 
of a determining class for probabilities measures is 

By definition of probabilities of comparison and partial quantile indices, 
we have the identity 

PxTx = P{X 4 x). 

Thus, if the family of sets {X ^ x}, x € 5, is a determining class, the proba- 
bilities of comparison and partial quantile indices characterize the underlying 
measure. 

Theorem 5. // the family of sets M(=4) = {{y G S : y =4 x} : x G S} is 
a determining class, then partial quantile indices and probabilities of com- 
parison uniquely determines the probability distribution. 

Below we show that partial orders described in Examples 1 and 2 lead to 
partial quantiles that characterize the probability measure. 

Lemma 4. If y ^ x only if x — y G K where K is a proper convex cone, 
as in Example 1, we have that A4(^) is a determining class. 

Lemma 5. If the partial order is given by an acyclic directed graph, as 
in Example 2, we have that Ai(^) is a determining class. 

Recall that a binary relation is said to be antisymmetric if x )p= y and 
y )p= x implies that x = y. In general, it follows that antisymmetry is a nec- 
essary condition for the probability measure to be characterized by the par- 
tial quantiles. Otherwise, any transfer of probability mass within indifferent 
points x ~ y would not change probabilities of comparison and partial quan- 
tile indices. Partial orders are antisymmetric by definition. 
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4.3. Monotonicity and partial quantiles. Recall that for univariate quan- 
tiles with the natural ordering, estimated quantiles are nondecreasing. In this 
section, we consider monotonicity properties with respect to the partial order 
of the estimated partial quantile surfaces and points. Similar to the standard 
univariate quantile case, such properties are valuable for interpretation and 
applicability of the partial quantile concept. 

We start with a positive result for the estimation of partial quantile sur- 
faces. The following result states that the transitivity in the partial order 
translates into monotonicity of the estimated partial quantile indices. The- 
orem 6 below is analogous to Proposition 3 but deals with estimated partial 
quantile indices instead of the true partial quantile indices. 

Theorem 6. Assume that the binary relation is transitive. Then, if x)?= 
y we have T x >T y . 

Next, we turn to partial quantile points where monotonicity is more del- 
icate. In this section our interest lies in cases for which the true partial 
quantile points are partial-monotone, that is, 



In particular, under transitivity, this implies that x T is unique for every 
r 6 (0, 1). In general, the true partial quantile points might not be partial- 
monotone with respect to the partial order (e.g., Example 5). 

However, even if the true partial quantile points are partial-monotone in 
the sense of (4.1), the estimated partial quantile points might violate this 
partial-monotonicity due to estimation error. 1 A similar lack of monotonicity 
is observed in quantile regression when conditional quantile curves are being 
estimated, see Koenker [32]. The result of this section is motivated by tech- 
niques recently developed to correct the lack of monotonicity of estimated 
conditional quantile curves in Chernozhukov, Fernandez- Val and Galichon 
[10, 11] and Neocleous and Portnoy [39]. 

Unlike the quantile index result mentioned above that makes no assump- 
tion in the space, additional structure is needed on the pair (<S, Based 
on the partial order, define the operations V and /\, which denote the least 
upper bound and the greatest lower bound, respectively, of any two points 
in S (these are also referred to as the "join" and the "meet"). We assume 



1 This can be observed in Figure 6 in Section 5, where the partial quantile points 
for the uniform distribution over the unit square are estimated. A close inspection of 
Figure 6 shows that I0.35 = (0.39,0.44) and 2?o.4 = (0.47,0.42), which violates the partial- 
monotonicity condition (4.1) although the true partial quantile points satisfy (4.1), as can 
be seen from Example 4 in Section 5. 




if t > r . 
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that (S, )?=) is a lattice space, that is, S is closed under A and V. For example, 
(M. d , >) is a lattice space under the operations 

x/\y=(x 1 Ay 1 ,...,x d Ay d ) and x \f y = (x 1 V y±, . . . , x d V y d ). 

Given an initial estimator {x T : r E (0,1)}, we define its majorant and 
minorant as 

(4.2) x* = f\ x T i and x w T = \J x T >. 

r'>r,T'e(0,l) r'<r,r'e(0,l) 

Note that by construction, and x^. are partial-monotones. They can 
be thought as upper and lower envelopes constructed based on the initial 
estimator. Also note that if x T is partial-monotone, then we would have 



4.3.1. Rearrangement and the case (W d ,>). Due to its importance in ap- 
plications, we carry over a monotonization scheme for the case of S = R d 
with the partial order being induced by the convex cone K = . The par- 
ticular structure of the cone is such that K = M + x • • • x M + is the cartesian 
product of the natural order. 

A possible monotonization scheme is given by a componentwise rearrange- 
ment, namely 



5£j = inf ly e E : J l{x Ujj < y} du > r| , 



j = l,...,d. 

Note that xf T is such that x^. < x r T < x^. We have the following result. 
Theorem 7. Assume that x T is partial-monotone. Then, for any k > 1, 



Jo j=1 Jo j=1 



\ K dT 



with probability one. 

Chernozhukov, Fernandez- Val and Galichon [10] had previously derived 
this improvement in the estimation by using rearrangement in the estima- 
tion of monotone functions (of which univariate conditional quantiles are 
a particular case) . 

The usefulness of Theorem 7 is twofold. On the one hand, it states that 
we always improve in terms of the L K -norm with respect to the original 
estimator. On the other hand, it allows us to check if the partial-monotone 
assumption is valid. 
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Corollary 3. Assuming that x T is partial-monotone, for any k > 1 we 
have 



j 

Consequently, if 



Jo , =1 Jo , =1 



\ k oIt. 




J 

x T is not partial-monotone. 

Note that if conditions E.3 and E.4 are satisfied with d(x, y) = \\x — y\\ K = 

{Y^j=i\ x j ~ Uj\ K ) 1 ^ K i f ne right-hand side of the expression above can be 
bounded by the rate of convergence of Theorem 2. Therefore, although 
Corollary 3 is not a formal statistical test, it can provide evidence for the 
lack of partial-monotonicity of partial quantile points since we can compute 
the L K distance between x\ and x T . The lack of partial-monotonicity of par- 
tial quantile points can arise due to nonuniqueness of partial quantile points. 
(In general, it can also arise if the binary relation is not transitive.) 

4.4. Independence, natural ordering and concentration of measure. Note 
that in general, even if the components are independent, partial quantiles 
can reflect a dependence created by the partial order. However, if the partial 
order is given by the componentwise natural order, some independence car- 
ries over. The next result specializes to the case where (5, ^) is >) and 
X is an M d -valued random variable whose components are independent with 
no point mass. In the following, let qx(T~) = (qx 1 (T),qx 2 ( T )> ■ ■ ■ >Qx d (T~))' de- 
note the vector whose components are the r-quantiles of the components 
of X. 

Theorem 8 (Independence, concentration of measure and partial quan- 
tile points). Consider an M. d -valued random variable X with no point mass 
and the natural partial order > . // the components of X are independent, 
then the partial quantile points (2.3) satisfy 

( r^ d \ 1 

x r = Qx[ tttt and Pr 



xT l/d + (1 - T )l/dJ ( T l/d + (1 _ r )l/d)d 

for all r€ (0,1). 

In particular, we have X0.5 = qx(0-5), and for any £ K -norm we have 
\\%t ~ x . 5 \\ K < \\qx{r) - qx(0.5)\\ K for all r £ (0, 1). 
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Theorem 8 leads to xo.5 = {iXi (0-5), qx 2 (0.5), . . . , qx d (0.5))', the vector 
with componentwise medians, which is intuitively reasonable in terms of the 
geometry. Moreover, we observe that for d> 1, 



T i/d + (i _ T y/d 



1 


< 


1 




r 


2 




2 



so that under independence, partial quantiles are always closer to the median 
than univariate quantiles. Therefore, partial quantiles exhibit a concentra- 
tion of measure phenomenon under independence and this partial order. 
However, the case of r = 0.5 also leads to p = l/2 d ~ 1 , which decreases ex- 
ponentially fast in the dimension d. In contrast, as r becomes extreme (i.e., 
r converges to zero or one), p T approaches one. The simplicity of the d = 1 
case follows from the fact that all points are comparable. We typically lose 
this advantage as soon as d > 1, and the degree to which increases in d 
make comparisons less likely depends on the partial order, the probability 
distribution, and the value of r. This illustrates a "concentration of measure 
phenomenon" and a "curse of dimensionality for comparisons." 

Comment 4.1 [Impact of correlations under (R d , >)]. Under (R d , >), if 
the components of X are positively correlated, the probabilities of compar- 
ison tend to be larger than under independence. However, under negative 
correlation, the probabilities of comparison tend to be smaller than under 
independence. These reflect cases in which the distributions are more or less 
aligned with the partial order. 

Comment 4.2 (Perfect positive correlation). In the case (M d ,>), if 
a (strictly) monotone transformation of the components of X are perfectly 
positively correlated, we have x T = qx(i~) and p T = 1 for every r € (0, 1). 
This is a trivial case in which multivariate partial quantiles collapse into the 
univariate quantiles. Not surprising, the concentration of measure statement 
is satisfied with equality. 

Next, we turn to partial quantile indices which also exhibit a concentration 
of measure under independence. 



Theorem 9 (Independence, concentration of measure and partial quan- 
tile indices). Consider a R d -valued random variable X with no point mass 
and the natural partial order > . // the components of X are independent, 
then the partial quantile indices (2.1) satisfy 

P(r x <r) = p(j2z j <log 
\j=i 
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where Zj are independent logistic random variables with zero mean, and 
variance tt 2 /3. 

In particular, we have that P(tx > 1/2) = 1/2 and that tx concentrates 
on extreme quantiles with respect to the dimension. Namely, for any positive 
number C , 

P(\t x - 0.5| < 0.5 - CVT 1 / 2 ) < 1/C. 

Theorem 9 yields a concentration of measure for partial quantile indices 
under independence. As the dimension grows, a realization of the random 
variable is more likely to have an extreme partial quantile index. Equiva- 
lently, a realization of the random variable is likely to belong to a partial 
quantile surface Q(t) for r close to zero or one. This has close connec- 
tions to the concentration of measure for a uniform distribution over the 
(i-dimensional unit cube, where most of the mass concentrates on corners. 
In our case, corners correspond to the extremes zero or one. 

Comment 4.3 [Q(t) as a partially-efficient frontier]. The notion of 
a partial quantile surface can be connected with that of an efficient frontier. 
A point x £ S is said to be in the efficient frontier of E with respect to a par- 
tial order if there is no point x' G E that dominates x in terms of the partial 
order. The definition of partial quantile surfaces allows us to generalize the 
concept of efficient frontiers for random variables. In this case, the support 
of the possible realizations of X plays the role of the set E. We can interpret 
the partial quantile surfaces Q(t) as partially-efficient frontiers parametrized 
by r, the probability of drawing a preceding point conditional on it being 
a comparable point. Partially-efficient frontiers for high values of r are likely 
to be of particular interest. It might be quite difficult to reach a point on 
the efficient frontier but much easier to reach a point on a partially-efficient 
frontier with r close to but not equal to one (as shown by Theorem 9 under 
independence). In such cases, the partially-efficient frontier notion might be 
quite appealing. In particular, if the support of X is R , partially-efficient 
frontiers are meaningful while the efficient frontier is empty. 

4.5. Partial quantile regions. One common use of univariate quantiles is 
to provide measures of dispersion. In this section, we propose an approach to 
build such measures of dispersion based on the partial quantiles. Tradition- 
ally, a measure of dispersion would be centered on the median and expanded 
to extreme quantiles. In the univariate case, for instance, Serfling [50] advo- 
cates the interval 



(4.3) /(«) 



1 — K\ ( 1 + K 



kg [o,i], 



to measure the dispersion of a random variable. With k = 0, is the me- 
dian, and as k increases from zero to one we obtain an interval with proba- 
bility at least k. 
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In the extension to the multivariate case, we shift from "interval" to 
"region." Moreover, in order to use partial quantiles, we need to specify not 
only the quantiles but also the minimum probability of comparison in which 
we are interested. We define the partial quantile region of levels 9 £ [0, 1] 
and T] £ [0, 1] as 

71(9, 7 1 ) = LeS:P(X4 x\C(x)) > 

(4-4) 

P(X fc= x\C{x)) > X —,Vx > (1 - r,) ■ PrX 

These regions consist of points that are "typical," that is, nonextreme 
partial quantiles with respect to the given partial order, which are more 
comparable to other points. Thus, partial quantile regions can help charac- 
terize dispersion around typical and comparable points. 

The family of sets 1Z is such that lZ(9,rj) C TZ(9',r/') whenever 9 <9' and 
7} < rf. By definition, 1Z(9,0) contains only the partial quantile points for 
indices r £ [(1 - 9)/2, (1 + 9)/2]. On the other hand, 11(0, 1) contains all the 
partial quantile surfaces for indices r £ [(1 — 9)/2, (1 + 9)/2\. Note that if we 
do not constrain the probability of comparisons, we would obtain unbounded 
regions in some situations. In the univariate case with the natural order (i.e., 
a complete order holds), we recover (4.3) since p x = 1 for every x £ M. 

In order to endow the partial quantile region with some probability cover- 
age, we fix a nondecreasing function g : [0, 1] — > [0, 1] such that g(0) = and 
g(l) = 1. (A simple rule would be to set ij = 9.) Define 

9* K = mf{9:P(Xe K(6,g(6))) >k}, 

and let the dispersion region 

K(K)=K(e*,g(9* K )). 

Therefore, the family {7Z(k) : k £ [0, 1]} satisfies the following properties: 

(i) Nested property. This family of sets is nested, 7£(0) = Q*(0.5) and 
K{l)=S- 

(ii) Coverage property. 1Z(k) is the smallest set in the family with prob- 
ability at least k; 

(iii) Ordering property. Any element x £ 1Z(k) satisfies \t x — 0.5 1 < 9*/2; 

(iv) Comparability property. Any element x £ 1Z(k) satisfies p x > (1 — 

Comment 4.4. With respect to the estimation of (4.4), results in Sec- 
tion 3 can be directly applied to estimate lZ(9,r]) uniformly on 9 £ [0, 1 — e] 
and X] £ [0, 1 — e], where e > is fixed or goes to zero sufficiently slowly. 
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4.6. Efficient computation. In this section, we turn our attention to the 
question of whether the computation of the partial quantiles (2.3) can be 
performed efficiently. The notion of efficiency we use is the one in the com- 
putational complexity literature, that is, that it can be computed in poly- 
nomial time with the "size" of the problem (usually the dimension of 5; see 
[4, 23, 38]). 

Such a question is usually tied to regularity conditions on the relevant ob- 
jects (in this case, on the probability distribution and on the partial order) 
and on the representation of the relevant objects. For example, the partial 
order could be given only by an oracle: for every two points in S, the or- 
acle returns the better point or reports that the points are incomparable. 
Alternatively, it could have an explicit format that allows us to exploit addi- 
tional structure (a similar idea holds for the representation of the probability 
distribution of the random variable). 

A simple result that pertains to the case when S has a finite number of 
elements. 

Lemma 6. Assume that the cardinality of S is finite, that we can com- 
pute P({x}) for every x G S, and that we can evaluate the partial order for 
any pair of points in S. Then we can compute all the partial quantiles in at 
most 0(|5| 2 ) operations. 

Lemma 6 explicitly evaluates all points in S. Therefore, it might be prob- 
lematic to rely on it when the cardinality of S is large. Moreover, we em- 
phasize that Lemma 6 does not provide any information regarding the case 
where S is not finite. A simple discretization of S C M. d would typically 
suffer from the curse of dimensionality (e.g., computational requirements 
would be larger than l/s d ). It is not surprising that the general case cannot 
be computed efficiently. 

Example 3. Let S = [0, l] d be the unit cube, and assume that the binary 
relation is such that x and y are incomparable for all x,y different from 
an unknown point x* 6 S for which P(X ^= x*\C(x)) = P{X =^ x*\C{x)) = 
1/2. With no additional information, it is not possible to approximate x* 
efficiently with any deterministic method. On the other hand, probabilistic 
methods have an exponentially small chance of ever being close to x* . (This 
computational problem is equivalent to maximizing a discontinuous function 
over the unit cube.) 

Note that Example 3 is an extreme and, arguably, uninteresting case. 
There are many interesting cases for which additional structure is available 
and can be explored. Here we will provide sufficient regularity /representation 
conditions on the probability distribution and on the partial order to allow 
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efficient computation of partial quantiles that require the maximization of 
the probability of drawing a comparable point over a subset of S. These 
conditions cover many relevant cases. 

Our analysis relies on the following two regularity conditions, one for the 
probability distribution and another for the partial order: 

C.l. Condition on the probability density function. Let S = M. d and let 
the probability density function / of the random variable X be log-concave. 
That is, for every x,y £ S and A G [0, 1] , we have 

f(Xx+(l-X)y)>f(x) x f(y) 1 -\ 

C.2. Condition on the partial order. For every x,y s5, we have 

(4.5) x )p y only if x — y £ K, 

where K is a convex cone with nonempty interior. 

In particular, condition C.l, log-concavity of / over 5, implies that S is 
convex. Moreover, a log-concave density function is unimodal, a useful prop- 
erty to achieve computational tractability. This is needed because of the 
representation model we will be using. Following the literature on compu- 
tational complexity for Monte Carlo Markov Chains (see Vempala [58] for 
a survey), we assume that we can evaluate the density function / at any 
given point. Nonetheless, the class of log-concave density functions covers 
many cases of interest, including Gaussian and uniform distributions over 
convex sets. As illustrated by Example 3, the restriction to log-concave dis- 
tributions alone is not sufficient to ensure good computational properties. 
Condition C.2 provides sufficient regularity conditions. The partial orders 
allowed in (4.5) cover many cases of practical interest, with K being equal 
to the nonnegative orthant or the cone of semi-definite positive matrices. 
Now we can state a key equivalence lemma for partial quantile points under 
these regularity conditions. It allows to replace the function p x by a variable 
p £ [0,1] in the formulation of partial quantile points under C.l and C.2 
which simplifies the optimization problem considerable. 

Lemma 7. Assume that conditions C.l and C.2 hold. Then the optimiza- 
tion problem formulation in (2.3) is equivalent to the following optimization 
problem: 

(p T ,x T ) E argmaxp 

p,x 

s.t. P(X)?x)>(1-t)p, 

(4.6) 

P(X ^X)> Tp, 



x€S,0<p< 1. 
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Table 1 

The hit-and-run method is a random walk that takes as input a covariance matrix Ti, 
an initial point {Vt,Xi), a probability density function gi, and a membership oracle 
for a convex set H(p). The output is a random point whose distribution is approximately 
according to gi restricted to H(p). The simulating annealing procedure changes the power 
to which the objective function is raised, gi(v,x) = exp(aiVi), so that the probability mass 
concentrates on the maximum (starting from near uniform). The final output is a point 
X* £ H(p) such that with probability 1 — 5, px* > (1 — s)p T . The optimization algorithm 
is based on Kalai and Vempala [29] and Lovdsz and Vempala [36] 

Optimization algorithm 

Step 0. Let p < p T , S G (0, 1), set m = [Vdln 2pT(d +' £ n(1/<5)) ] , k = \c dlog 5 d] and 

ai : = ^(1 + -j^y and gi(y,x) = exp(ajw), for i = 1, . . . ,m. 
Step 1. Let (Vq ,Xq), . . . , (Vq , Xq ) be independent uniform random points from 

C logP(JTfc=z)>log(l-T)+u,] 
H(p):= I {v,x)eRxS: logP{X =4x)>logr + v, > 
{ logp<u<0 J 

and let To be their empirical covariance matrix. 
Step 2. For i=l,...,mdo the following: 

Get independent random samples (V^,Xl), . . . ,(Vi, k ,X![) from gi on H(p), 
using hit-and-run with covariance matrix Ti, starting from (V i _ 1 ,X i _ 1 ), . . ., 
(^i-i)-^i-i)) respectively. Set Ti+i to be the empirical covariance matrix of 

Step 3. Output TaaXj = i,,..,kP x j an d the maximizer point X* . 



An important consequence of Lemma 7, due to the log-concavity assump- 
tion, is that by a simple change of variable p = exp(v), (4.6) can be re- 
cast as a convex programming problem. We will be interested in computing 
an e-approximate solution, that is, a point x £ T such that \t x e — t\< e and 
Px% >Pr(l -e). 

It is helpful to first consider the case that a membership oracle to evaluate 
P(X ^ x) and P(X ^ x) is available. In that case, because of Lemma 7, we 
can directly use random walks and simulating annealing proposed in Kalai 
and Vempala [29] and Lovasz and Vempala [36] to compute an approximate 
maximizer. Table 1 displays the efficient algorithm. 

In the case that only a membership oracle for the probability density 
function / is available, we can efficiently approximate P(X =4 x ) an d P(X !>= 
x) by a factor of 1 + e again by random walks and simulating annealing 
as proposed in Lovasz and Vempala [36]. This can be used in the above 
algorithm to construct the following result. 

Theorem 10. Assume that conditions C.l and C.2 hold. If we have 
a membership oracle to evaluate the probability density function and to eval- 
uate the partial order, then for every precision e > 0, with probability 1 — 5 
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we can compute an e-solution for a T-partial quantile polynomially in d, 
\u{l/5) and l/(p T e). 

Theorem 10 establishes that conditions C.l and C.2 are sufficient for the 
existence of an efficient probabilistic method to approximate partial quantile 
points. 

4.7. Comparison with generalized quantile processes. At this point, it 
is clarifying to discuss relations with the interesting work of Einmahl and 
Mason [18]. These authors proposed a broad class of generalized quantile 
processes 



for t G (0, 1), where A is a continuous function (usually the volume function) 
and A is a chosen family of sets. Formulation (4.7) does not cover the pro- 
posed approach. In particular, the family of sets in (4.7) is nested in r. One 
important difference is the incorporation of a partial order structure which 
raises issues of incomparability between points, leading to the use of condi- 
tional probabilities. Moreover, the focus of [18] is on the M-valued process 
{U(t) :t G (0, 1)}. In this work, in additional to the process {p T :r G (0, 1)}, 
we are interested in other processes such as {x T : r G (0, 1)} and {t x : x G 5}, 
which are, respectively, S- valued and indexed by S. 

The generalized quantile process U : (0, 1) — > M. as defined in (4.7) is esti- 
mated by 



Einmahl and Mason [18] establish an asymptotic approximation for the pro- 
cess r i y \fn(U n {r) — U(t)). However, their analysis does not apply to partial 
quantiles. For instance, partial quantiles are built upon conditional proba- 
bilities induced by the partial order instead of the original probabilities. 
(This is also very different from that of Polonik and Yao [44] , for which the 
conditioning is fixed within the maximization.) In addition, note that (4.8) 
automatically implies that U n (s) < U n (t) for s <t, which is likely to fail in 
our case. Their analysis relies on a regularity condition that requires U to 
be strictly increasing. Regarding their assumptions, they also impose E.l, 
E.2, E.4 and E.6. Note that condition E.5 does not appear in Einmahl and 
Mason [18] because the objective function is deterministic. 

In our context, we would like to estimate the mapping r i— >• p T by its 
sample counterpart T^p T . However, the monotonicity assumption cannot 
be invoked in general. In fact, it does not hold in many cases of interest 
or under independence as shown in Theorem 8. Moreover, our estimated 
partial quantiles involve an objective function that is data dependent, p x = 



(4.7) 



U{t) = mm{X(A) : P(A) > r, A G A} 



(4.8) 



U n {r) = inf{A(A) :P n (A) > r, A G A}. 



PARTIAL QUANTILES 



27 



F n (C(x)), and not a fixed value as the objective function in (4.8). In general, 
we will not be able to uniformly estimate the entire function at a ^/n-rate 
due to the weaker identification condition, which seems to introduce a bias 
even if the e n term is zero. As in [18] for the process \fn(U n {r) — U(r)), 
one should expect possibly non-Gaussian limits for ^/n(p T — p T ) since the 
partial quantile points might be nonunique. Since Einmahl and Mason [18] 
are interested in U, they did not study the convergence property of the points 
(sets A G A in their framework) that achieve the maximum, as Theorem 2 
does. Also, there are no analogs of partial quantile indices in [18]. 

Finally, note that it is potentially interesting to apply the machinery of 
the generalized quantile process of Einmahl and Mason [18] with X(A) = 
volume (A) and A = {TZ(k) : k G [0, 1]}, since the sets in A are nested. How- 
ever, unlike in [18], the sets in A are unknown a priori and also need to be 
estimated. 

5. Illustrative examples. The following examples illustrate our defini- 
tions in different settings, thereby illustrating some possible characteristics 
of partial quantiles. Our intention is to provide some intuition regarding the 
behavior of t x , Q(t), x t , p x , p T and p in a variety of cases and to show that 
the interaction between the partial order and the probability distribution 
plays a key role. 

Example 4 (Unit square in M 2 ). Let X ~ Uniform([0, l] 2 ), with a )>= b 
only if a > b componentwise. Note that 

P(A>x) = (l-x 1 )(l-x 2 ), P(X 4x) = x lX2 



characterize the partial quantile indices for every x G [0, l] 2 . It follows that 
to maximize p x for x G Q(t), the partial quantile points are on the diagonal 
x\ = X2 and are given by 



Figure 1 illustrates the partial quantile indices r x and p x for each x G [0, l] 2 . 
The shapes of the partial quantile surfaces can be inferred from the color 
bands of partial quantile indices, with each band containing Q(t) for an 
interval of values of r. The symmetry leads to the partial quantiles being 
on the diagonal, and we can see from the graph of values of p x on the 
diagonal that p T — > 1 as r — > or 1 and is minimized at the partial median 



and 



p x = 1 - x\ - x 2 + 2xiX2 




with p T 




1 



a;o.5 = (1/2, 1/2), with p = l/2. 
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(a) (b) 



Fig. 1. (a) Partial quantile indices and (b) probabilities of comparison for x £ [0, l] 2 in 
Example 4- 



Since partial quantiles generalize univariate quantiles under the natural 
ordering, we must inherit some of its features. For example, multiplicity 
is possible. However, we note that in a multidimensional setting with the 
additional freedom of a partial order, the set of r-partial quantiles for a given 
r does not need to be convex. Multiplicity and nonconvexity of the set of 
r-partial quantiles for a given r are illustrated by the next example, which 
can be thought of as a mixture of two populations. In the univariate case, 
mixtures, just as any other distributions, always lead to convex collections 
of quantiles. 

Example 5 (Nonuniqueness) . Consider the random variable 



with a^= b only if a > b componentwise. In this case, no points in the square 
(— 1, 1) x (1,3) can be compared with any point in the square (1, 3) x (—1, 1). 
This situation leads to nonuniqueness of the partial quantiles. For r £ (0, 1), 
we have 



Here p = 1/4 and p T < 1/2 for every r S (0, 1), because the two squares are 
not in alignment with the partial order. See Figure 2 for the representation. 
Moreover, the set of r-partial quantiles for a given r is not convex. For 
example, the set of r-partial quantiles for r = 1/2 is {(0, 2)', (2, 0)'}. The 
intuitive geometric notion of a spatial median would report the point (1,1)', 
which is not a partial quantile because it is not comparable with any point 
in the support of the distribution and thus having P(\ t i) = 0. 



X~Uniform((-l,l) x (1,3) U (1,3) x (-1,1)) 




PARTIAL QUANTILES 



29 




am 



II 



BB I 



in- 



i. — T 




I 



(a) 



(b) 



FlG. 2. (a) The potential nonuniqueness of the partial quantiles arising from 
order (Example 5). (b) The case of the partial order being aligned with the 
distribution (Example 6). 



the partial 
probability 



In the next example, which also involves a mixture of two populations, 
the probability distribution is better aligned with the partial order. 

Example 6 (Aligned distribution and partial order). Consider the ran- 
dom variable 

A~Uniform([0,l] 2 U [1,2] 2 ) 

with a )p= b only if a > b componentwise. The probabilities of the events 
{A" ^= x} and {X =<! x} are 



P(X ^ x) 
P{X 4 x) 



l + (l-xi)(l-x 2 ) 



X\X 2 



for x € [0, l] 5 



and 



P(X )p x) 



(2-si)(2-x 2 ) 



P(X 4x) = 1 + for x e [1; 2f . 



The partial quantiles can be computed explicitly: 



Vl + 4(l/(2r)-l)-l (I 



2(l/(2r)-l) 
for r = l/2 



for r< 1/2, 
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and 

/ yi + 4(1/(2(1 -T))-i)-i \ fi\ 

^={ 2 2(1/(2(1 -r))-l) J(lJ forr>1/2 ' 

Note that in contrast to Example 5, we have p = 3/4 in this case since the 
ordering is somewhat aligned with the distribution [see Figure 2(b)]. 

Examples 5 and 6 show the impact the alignment of the probability dis- 
tribution with the partial order can have on the partial quantiles and on p Xi _ . 
This alignment is good in Example 6, and the partial quantiles are on the 
main diagonal. Any point x G Q(t) for some r will have a lower p x than x T , 
the member of Q(t) on the main diagonal. Here the maximization of the 
probability of drawing a comparable point leads to partial quantiles that are 
consistent with what we might expect. In Example 5, on the other hand, 
the maximization of the probability of drawing a comparable point leads to 
two partial quantiles for each value of r. Each of these two partial quan- 
tiles seems reasonable in the context of the square that it is in. Since the 
two squares are not in alignment with the partial order, however, the two 
r-partial quantiles for a given r are disconnected. Results like this are to 
be expected with such a lack of alignment. This is analogous to trying to 
identify a mode with a bimodal distribution having widely separated modes. 

There are extreme cases in which the probability distribution is not aligned 
at all with the partial order, as illustrated by Example 7. 

Example 7 (Noncomparable). Let X ~ Uniform(A d_1 ), where d > 1, 

A d_1 = jxeR^s^O.J^Xj = 1 j 

is the (d — l)-dimensional simplex, and a ^ b only if a > b componentwise. 
In this case, no two points can be compared. Therefore, we have p x = and 
P(X >p x\C{x)) = P(X 4 x\C{x)) = 1 for all x £ A d_1 . Definition 2 yields 
Q*( T ) = Q(t) = A d " x for all r E (0, 1) and p = 0. 

Although Example 7 might suggest a departure from the traditional quan- 
tile definition, it deals with the somewhat extreme case in which no points 
are comparable. This situation is in sharp contrast with the complete order 
that we are accustomed to in the univariate case. Nonetheless, it provides 
a meaningful illustration of a situation in which no point is better than any 
other if we rely only on the partial order. This situation is analogous to 
trying to compare points on a Pareto-efficient set, or an efficient frontier, 
where the points on the frontier dominate other points below and to the left 
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of the frontier but the partial order does not allow us to say that any point 
on the efficient frontier is better than any other. 

Next, we consider the case of a complete order in detail, as described 
earlier. Note that many complete orders are not partial orders since anti- 
symmetry might fail. Nonetheless, all the quantities proposed here can be 
defined analogously. 

Example 8 (Complete order). Suppose that the binary relation =<! can 
be represented by a real-valued measurable function, that is, x ^ y if and 
only if u(x) > u(y) for some u : S — > M. This is a well-behaved case in which 
we have a complete order in S. Therefore, we have 

P(X )? x T ) = P(u(X) > u(x T )) > (1 - r) and P(u(X) < u{x T )) > r. 

Consider the (standard) quantile curve q u (x) '■ (0, 1) — > K of the random vari- 
able u(X). Then p x =p T = p = 1, t x = q~^ x) (u(x)), Q(r) = u _1 (^(X)( T )) 
and Q*(t) = Q{t). 

The situation described in Example 8 is encountered, for example, in deci- 
sion analysis when the consequences in a decision-making problem are multi- 
dimensional in nature and u might be represented by a payoff or utility func- 
tion (e.g., Keeney and Raiffa [30]). We emphasize that the reparametrization 
allows us to reduce to the standard univariate case, but the partial quantiles 
in the original space S would be given by the preimage of the function u 
and could have an arbitrary geometry even if we have an interval (possibly 
a point) in terms of u. 

In the following example, a random set is the random element of interest 
in the appropriate space under the inclusion ordering (see Molchanov [37] 
for precise definitions). 

Example 9 (Interval covering). Let S be the set of all closed intervals 
on [0, 1] , and let X be a closed random interval, 

* = Ki,6], £j~Unifarm([0,l]) for j = 1,2. 

The partial order is given by a ^ b only if b C a. Let x = [xi,X2] C [0, 1] be 
an interval. Then we have 

P(X^x) = 2xi(l-x 2 ) and P(X 4 x) = \x 2 - xi| 2 , 

which characterize the partial quantile surfaces. Using Anderson's lemma, 
and letting a(r) = \/2(l — t)/t, one can show that partial quantiles are 
achieved on symmetric intervals centered at 1/2 and given by 

"1 11 1 

Xt ~ [2 ~ 2 + 2a(r) ' 2 + 2 + 2a(r) 
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and 

Pr= (l + a(r)) +2 (2" 2 + 2a(r) 
Next, we consider an example of a discrete set 5. 

Example 10 (Partial order based on acyclic directed graphs). Let X 
be a uniform random variable on S = {a,b,c,d,e, f, g,h,i,j,k} . The partial 
order relation is given by an acyclic directed graph, as in Figure 3(a), and x =^ 
y if there is a path from x to y in the graph. Figure 3(b) illustrates how the 
partial order relation impacts the partial quantile indices and probabilities of 
comparison. Note also that P{X =4 f) > 0.5 and P(X /) > 0.5, making / 
the partial median. 

We conclude the examples with a binary relation that is not transitive. 

Example 11 (Nontransitive binary relation). Let X be a random vari- 
able with values in S = {a,b,c}, P(X = a) = 1/2, P(X = b) = 1/3 and 
P(X = c) = 1/6. The binary relation is given by a directed graph, as in 
Figure 4, and x =4y ii there is an arc from x to y in the graph. The cycle in 
the graph indicates that the binary relation is not transitive. We note that 
in this particular example, there are no extreme partial quantiles. That is, 
the partial quantile surfaces are Q(t) = for r sufficiently close to or 1. 



P(X = «.) = ! /2 




P(X - c) - 1/6 P (X = b) = 1/3 



Fig. 4. The cyclic directed graph with x =4 y if there is an arc from x to y. The cycle 
indicates that the binary relation is not transitive. Moreover, there are no extreme partial 
quantiles in this example. 
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5.1. Illustration of estimation: The unit square example. In order to il- 
lustrate previous results and statements from Sections 2, 3, 3.4 and 4, we 
consider Example 4 in detail. In this case, S = [0, l] 2 , the probability distri- 
bution P is the uniform distribution on [0, l] 2 , and the partial order is given 
by the a b only if a > b (i.e., a\ > b\ and a% > 62)) which is a conic order 
with K = W+. For convenience, we denote the dimension of S be d = 2. 

The class of sets T = {C(x),{y E S :y x}, {y E S : y fc= x] : x E 5} is a VC 
class of sets whose VC dimension is of the order d, so we have v(T) < d. We 
consider the metric to be the usual euclidian norm d(x,y) = \\x — y\\. From 
Theorem 8, we have p = l/2 d ~ 1 . 

Condition E.2 holds with v(p) < d/p 2 . Condition E.3 for r E (0, 1) holds 
with a = 2 and c = l/2 d (note that for r E {0, 1} we would have a = 1). 
Condition E.4 holds with 7 = 2 for r = 0.5 and 7 = 1 otherwise. Condition 
E.5 holds with (j) n {r) < (r 1 / 2 + n~ 1 / 4 )y / logn by applying maximal inequali- 
ties (the logn term can be dropped if we are interested in a single quantile). 
Finally, condition E.6 holds by an uniform central limit theorem over T 
(see Dudley [16], Theorem 3.7.2, or van der Vaart and Wellner [57], Theo- 
rem 2.5.2). 

In Figures 5 and 6, we display the estimated partial quantile indices and 
points for the case of d = 2 with a sample size of n = 5,000. Note that the 
graph of the estimated partial quantile indices in Figure 5 looks very similar 
to the graph of the true partial quantile indices in Figure 1. The difference 
between the true and estimated values is also shown in Figure 5. In light of 
Theorem 1, the partial quantile surface is estimated uniformly over Cp at an 
n 1 / 2 -rate of convergence if p is fixed. We see from the difference between the 
true and estimated values in Figure 5 that the convergence is slower at the 
top left and bottom right corners, which correspond to points with small 
probabilities of comparison p x . 




Fig. 5. (a) Estimated partial quantile indices and (b) the difference between the estimated 
and true partial quantile indices for uniform samples on the unit square. 
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Although the exact partial quantiles fall on the x\ = x% diagonal, we can 
see from the few quantiles labeled in Figure 6 that they are not evenly spaced 
along the diagonal. Instead, they are closer together for r near 0.5 and more 
spread out as r — > or 1. Moreover, the exact and estimated values of p T 
are smaller for r near 0.5 (the minimum value of the exact p T is po.5 = 0.5) 
and grow larger as r — > or 1. The estimated quantiles in Figure 6 are close 
to but not equal to the true quantiles. Also, there is a slight violation of 
monotonicity in the estimated quantiles, a point we will expand upon later. 

If we are interested in computing partial quantiles only for the case of 
U = {1/2}, we can take 7 = 2, which yields a n 1 / 3 -rate of convergence by 
Theorem 2. Note that for IA = {0, 1} we have 7 = 1 and a = 1, which also 
leads us to a n 1 / 3 -rate of convergence by Theorem 2. On the other hand, if 
we are interested in computing for a nondegenerate interval U of quantiles, 
we have that 7 = 1, which leads to an ra 1//4 -rate of convergence. 

Figure 7 illustrates the application of the rearrangement procedure pro- 
posed here to the estimated partial quantiles in Figure 6, which violated 
monotonicity for rG [0.35,0.40]. The rearrangement results in estimated 
partial quantiles that coincide with the original estimates except for r € 
[0.35,0.40], where they are modified to eliminate the violation of mono- 
tonicity. 

Exact and estimated dispersion regions with r] = g(9) = 6 for Example 4 
are shown in Figure 8, corresponding to the exact and estimated partial 
quantile indices given in Figures 1 and 5. The dispersion regions seem intu- 
itively reasonable, and the estimated regions are quite similar to the exact 
regions. The dispersion regions for high values of 9 extend out toward (0, 1)' 
and (1,0)', to regions where the probabilities of comparison are low. 
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Fig. 7. The componentwise rearrangement procedure applied to the estimated partial 
quantiles from Figure 6. 

6. Applications. In this section, we use the concept of partial quantiles 
in two empirical applications, one involving the intake of dietary components 
and the other involving the performance of mutual funds. Our goal is not 
to do a detailed, full-scale analysis in each case, but to briefly illustrate the 
use of partial quantiles and show some of the capabilities of the concepts 
and measures discussed here. In particular, partial quantiles provide useful 
graphical and quantitative summaries of the data. 

6.1. Intake nutrients within diets. Quantitative information regarding 
the intake distribution of several dietary components (e.g., calcium, iron, 
protein, Vitamin A and Vitamin C) has been collected by the U.S. Depart- 




(a) (b) 



Fig. 8. (a) True and (b) estimated dispersion regions 1Z(9,9) for Example 4, with the 
boundaries of the regions labeled by 6. 
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Fig. 9. (a) Data (scatter diagram) and partial quantiles, and (b) estimated probabilities 
of comparison p T for the (multidimensional) iron and protein levels in food intakes. 

ment of Agriculture (USDA) through periodic surveys. This information 
is used to formulate food assistance programs, consumer education efforts, 
and food regulatory activities. One important concept in analyzing food con- 
sumption data is the usual intake, defined as the long-run average of daily 
intakes of dietary components by individuals. Nusser et al. [42] propose an 
approach that assumes the existence of a transformation of the data such 
that both the original distribution and measurement errors are normally 
distributed. Among other relevant statistics, they estimate the quantiles of 
several dietary components, focusing on each component separately. 

For simplicity, we consider only two dietary components, daily intakes 
of iron (in milligrams) and protein (in grams), in our analysis. The partial 
order is the componentwise natural order. Partial quantiles are relevant in 
this situation because not all pairs of diets (as summarized by their usual 
intakes) are necessarily comparable in the sense that we can say that one 
of the pair is "better" than the other. If one diet has more iron and the 
other has more protein, for example, they are not comparable. We recognize 
that this partial order rule may not hold for all values of the intakes. At 
extremely high levels of a component, it may be undesirable to increase the 
intake yet further, but we will assume that the partial order holds within 
the range of the data. Another factor that can be relevant is that intakes of 
different dietary components are not independent. With this partial order, 
for example, a positive correlation between iron intakes and protein intakes 
is more in alignment with the partial order and will lead to higher probabili- 
ties of comparison than a negative correlation. Therefore, understanding this 
dependence can be important in designing policies such as those mentioned 
above. Moreover, the invariance of partial quantiles under order-preserving 
transformations is important since different components tend to have differ- 
ent scales. 
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The data we use are a subset of the data from the 1985 Continuing Sur- 
vey of Food Intakes by Individuals (CSFII) [56], a data source used in [42]. 
A scatter diagram of the data is given in Figure 9, which indicates that 
the data are quite well-aligned with the partial order. The estimated partial 
quantiles shown on this scatter diagram are monotonically increasing (in 
terms of the partial order) in r. We would expect to see some diets that are 
not comparable. Different people may tend to emphasize different types of 
foods, with different mixes of nutrients, in their diets. Nonetheless, the data 
indicate that all of the estimated partial quantiles comparable with 

more than 78% of the sampled diets, as can be seen from Figure 9. This sug- 
gests that partial quantiles can be interpreted very similarly to the usual uni- 
variate quantiles. For example, when deriving policies/activities/programs, 
the decision maker can consider the 0.5-partial quantile to be a reasonable 
representation of the "median" individual. Table 2 and Figure 10 display 
comparisons of estimated univariate quantiles and partial quantiles. In this 
case, the partial quantiles are slightly more concentrated around central val- 
ues than are the univariate quantiles. This reflects the intuitive notion that 
it is too extreme to interpret a componentwise univariate quantile as its mul- 
tidimensional counterpart. We note that the univariate quantiles in Table 2 
differ from those for the same nutrients in [42] because we present the stan- 
dard sample quantiles, whereas a measurement error model and assumptions 
of normality are used to generate estimated quantiles in [42]. 

Figure 11 gives more details, showing the estimated partial quantile in- 
dices t x and the probabilities of comparison p x for all x. The borders between 

Table 2 

Comparison between estimated univariate quantiles and partial quantiles for iron and 

protein intakes 



Quantile Univariate quantile Partial quantile 



Index (t) 


Iron (mg) 


Protein (g) 


Iron (mg) 


Protein (g) 


0.1 


4.51 


25.95 


4.69 


25.97 


0.2 


5.99 


35.62 


6.16 


37.51 


0.25 


6.61 


39.89 


6.74 


41.83 


0.3 


7.12 


43.53 


7.33 


44.72 


0.4 


8.11 


49.63 


8.21 


50.49 


0.5 


9.12 


56.48 


9.09 


59.14 


0.6 


10.29 


63.61 


9.97 


62.03 


0.7 


11.47 


70.81 


10.85 


67.80 


0.75 


12.30 


75.50 


11.44 


73.57 


0.8 


13.25 


80.82 


12.61 


76.45 


0.9 


16.30 


95.34 


15.84 


87.99 



38 



A. BELLONI AND R. L. WINKLER 




colors indicating the partial quantile indices capture the shape of the "qual- 
ity" of the diets in a comparative sense and show that the partial quantile 
surfaces appear convex for these data. For example, a subject with levels 
of iron and protein of (17.894,87.995) will be on the 0.95 partial quantile 
surface among diets that are comparable with her diet, since her diet is on 
the upper right-hand border of the light red band in Figure 11(a). This bor- 
der can be thought of as a partially efficient frontier of the intake of iron 
and protein at a 95% level in this application since any diets on that border 
are better than 95% of the comparable diets. Moreover, this partial quantile 
surface allows us to consider comparative statics of the changes needed to 
stay at the same partial quantile level but with higher probabilities of com- 
parison. Note that the graph of the probabilities of comparison is roughly 
symmetric, with p x decreasing as we move away from the rough "axis of 




Fig. 11. (a) Estimated partial quantile indices and (b) estimated probabilities of com- 
parison for levels of iron and protein in food intakes. 
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Fig. 12. The dispersion measure 1Z(6,8) based on estimated partial quantiles for the 
(multidimensional) iron and protein levels in food intakes. The boundaries of the regions 
are labeled by 9. 

symmetry" along a particular partial quantile surface. This is consistent 
with the location of the partial quantiles in Figure 9. Figure 12 provides yet 
additional information by showing the regions 1Z(6,6) from the dispersion 
measure in (4.4) for selected values of 8. 

6.2. Evaluating investment funds. Next, we consider evaluating the per- 
formance of investment funds. Several indices have been considered toward 
this end in the Finance literature. A central approach is to regress the re- 
turn of the fund (Rf) above the return on the risk free asset (r) against the 
return of the market (Rm) above the return on the risk free asset 

(R F -r) = a + f3(R M -r), 

which arises from a standard CAPM model (e.g., [53]). The exposure with 
respect to /3 should not be rewarded, and higher values of the intercept a, 
the risk adjusted return (i.e., the expected return on the fund when the 
market yields a return of zero) should be rewarded. 

An emerging literature within finance advocates that in addition to the 
risk-adjusted return, market timing should also be rewarded (see [13, 26, 
28, 60] and the references therein). The difference between returns on the 
market and returns on the fund can be broken down by whether they are 
positive or negative to capture market timing [13]: 

(6.1) (R F - r) = a + /3 + max{i? M - r, 0} + /T mm{R M - r, 0}. 

Note that max{i?^ — r, 0} > and min{i?M — r, 0} < 0; a better performance 
would have /3 + positive (the more positive the better) and j3~ negative 
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Fig. 13. Data, estimated partial quantiles, and estimated probabilities of comparison for 
the performance of investment funds. 



(the more negative the better). Therefore, in the model (6.1), the quantity 
A := /3 + — (3~ captures the market timing ability of the fund. Once again, 
the partial order that we will use for the pair (a, A) is the componentwise 
natural order. 

We use the data used by Andrade in [13]. Figure 13 shows the data, the 
estimated partial quantiles, and the associated probabilities of comparison. 
Since the partial order is not complete, we expect to have funds that are 
noncomparable. In contrast to the previous application, the data are not 
well-aligned with the partial order. It appears that a and A have a strong 
negative correlation. As a result, the estimated values for the probabilities 
of comparison p T are very small, always below 0.20 and with p = 0.00651. 

Figure 14(a) shows that the partial quantile surfaces for different values 
of r are quite close to each other and, except for extreme values of r, fol- 




FlG. 14. (a) Estimated partial quantile indices and (b) estimated probabilities of com- 
parison for the performance of investment funds. 
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Fig. 15. The componentwise rearrangement procedure applied to the estimated partial 
quantile points for the performance of investment funds. The difference is 2.141. 

low a pattern that is linear with a negative slope. This narrow band passes 
through a region with probabilities of comparison quite low everywhere, 
consistent with the above observation regarding Figure 13. Therefore, small 
random variation can cause potentially large shifts in partial quantile in- 
dices. As a result, the estimated partial quantiles are not monotonic. When 
we apply the rearrangement procedure from Section 4, we get the results 
shown in Figure 15. The rearranged partial quantiles are monotonic, but 
note that many fall outside the support of the data. Moreover, the l<i(l£) 
distance between the rearranged and the original estimator of the partial 
quantile point process is 2.141 within the range of r £ (0.1,0.9). These ob- 
servations provide strong evidence that the true partial quantiles are not 
partial- monotone in the sense of (4.1). 

How can we interpret the results for this evaluation of investment funds? 
We suggest that the results provide some evidence that most (if not all) of 
the funds may actually be optimizing their choices and (up to random fluc- 
tuation) performing on the efficient frontier. Therefore, their performance is 
not dominated by many other funds, and when it is, the differences in per- 
formance are slight and seem consistent with random variation. Similarly, 
their performance does not dominate many other firms. This lack of much 
domination in the data set would explain the low probabilities of compara- 
bility. Since funds have different targets for the ideal trade-off between risk 
and return, we should not be surprised to observe many points on or near 
different portions of the efficient frontier in the data, and the data seem to 
be consistent with this expectation. To some extent, this is very similar in 
spirit to Example 7, where no point is comparable with any other point. 

6.3. Tobacco and health knowledge scale (THKS). We consider the Tele- 
vision School and Family Smoking Prevention Cessation Project (TVSFP) 
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Table 3 

Tobacco and health knowledge scale postintervention results 
subgroups frequencies (and percentages) [21] 



Subgroup 


THKS 


score 


Total 


CC TV 


Pass 


Fail 


No No 


175 


246 


412 




(41.6) 


(58.6) 




No Yes 


201 


215 


416 




(48.3) 


(51.7) 




Yes No 


240 


140 


380 




(63.2) 


(36.8) 




Yes Yes 


231 


152 


383 




(60.3) 


(39.7) 




Total 


847 


753 


1,600 




(52.9) 


(47.1) 





study (Flay et al. [19] and Gibbons and Hedeker [21]), which was designed to 
test the effects of a school-based social resistance classroom curriculum and 
a media (television) intervention program in terms of tobacco use prevention 
and cessation. We refer the reader to [21] for the details of the experiment, 
and we report the data collected in Table 3. 

The partial order of the policy maker is to obtain a "Pass" over "Fail" 
regardless of the subgroup. For the same result of the THKS, given cost 
and political considerations, it is preferred not to have used social resistance 
classroom curriculum (CC) or a media (television) intervention (TV). How- 
ever, the subgroup with no CC and TV is not comparable to CC and no TV. 
The partial order is summarized by the acyclic directed graph in Figure 16. 

Based on the data of Table 3 and the partial order described in Figure 16, 
we compute the partial quantile indices and probabilities of comparison, see 
Figure 17. 

In this application we note the high values of the probability of com- 
parisons. That makes the interpretation of partial quantiles very similar 




Fig. 16. The partial order represented by an acyclic directed graph. We have that a =$b 
if there is a directed path from a to b. 
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Fig. 17. The figure displays partial quantile indices and probabilities of comparisons. 
According to the partial order of the policy maker we have P(X ^= "CC TV Pass") > 1/2 
and P(X ^ "CC TV Pass") > 1/2 making "CC TV Pass" the (partial) median. 

to traditional quantiles. In particular, the outcome "CC TV Pass" is such 
that P(X ^ "CC TV Pass") > 1/2 and P(X 4 "CC TV Pass") > 1/2 mak- 
ing "CC TV Pass" the (partial) median. 

7. Conclusions. We propose a new generalization of quantiles to the mul- 
tivariate case based on a given partial order. An important feature of our 
definition is that it is based only on the probability distribution and on the 
partial order, which might or not on the geometry of the underlying space. It 
leads to a concept that has several desirable properties, including robustness 
to outliers and equivatiance/invariance under transformations that preserve 
the partial order. Several issues regarding estimation and computability are 
investigated and discussed. In particular, rates of convergence are derived, as 
are asymptotic distributions of many quantities, and efficient computation 
is shown for an important subclass of distributions and partial orders. 

The partial order is the additional structure exploited in this work. It 
is clear that partial quantiles depend crucially on the choice of the partial 
order. Therefore, their interpretation will also depend heavily on the partial 
order. We advocate that the choice of the partial order is application de- 
pendent. Thus, the relevance of these concepts for a particular application 
is linked with how meaningful the partial order is for that application. An 
alternative approach would be to choose the partial order to achieve par- 
tial quantiles with a desired property. For instance, one might want partial 
quantiles with high probabilities of comparison (which can be achieved with 
any binary relation that is a complete order), or partial quantiles that char- 
acterize the probability distribution (which can be achieved if the partial 
order induces a determining class), etc. Although these types of goals can 
be achieved by the appropriate choice of a partial order, it is very important 
for the partial order to make sense in the context of the specific application 
because the interpretation of all the concepts will be tied with that partial 
order. 
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Many extensions of the concept of partial quantiles are possible. For in- 
stance, the idea of embedding the partial quantile notion within a regression 
framework is of interest, as in [7-9, 24, 33]. Another possibility is to study 
the pattern of partial quantile surfaces conditional on covariates, since par- 
tial quantile surfaces also provide a meaningful generalization of the concept 
of an efficient frontier. 

Censored models have a wide range of applications and have attracted 
considerable interest due to their connection with quantiles observed by 
Powell [46-49] and others [6, 40, 41, 45, 61]. However, typical data ex- 
hibit censoring in more than one variable. Due to the equivariance under 
order-preserving transformations, the proposed generalization of quantiles 
is suitable to be applied to censored multidimensional data. 

Moreover, another motivation to consider partial orders, or more gen- 
eral preferences, is the connection with the literature of decision theory. For 
example, the identification of axioms on the preferences that allow for statis- 
tical inference, computational tractability, etc., is of interest. Similarly, the 
identification of classes of decision problems for which partial quantiles play 
an important role in optimal strategies would be very valuable. Although 
the pursuit of these extensions is outside the scope of this paper, we believe 
that they provide questions of interest for future research. 



Proof of Proposition 1. This follows from the equivalence between 
the events {h(X) y h(Y)} and {X y Y}, and the events {h(X) y h(Y)} and 
{XyY}. □ 

Proof of Proposition 2. If m is an invariance mapping, it follows 
that C(m(x)) = m(C(x)) and X y m(x) = m(X y x). Therefore, 



This implies that if x £ Q{t), then m{x) £ Q(t), and if x is a r-partial 
quantile, so is m(x). □ 

Proof of Proposition 3. Since the binary relation is transitive, {X y 
x} C {X y x'} and {X 4 x} D {X 4 x'}, so that P(X y x') > P(X yx)>0 
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P{Xyx\C(x)). 



P(C(x)) 
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and P(X 4x)> P(X 4 x') > 0. Therefore, 
r x = P(X 4 x\C(x)) 

P{X 4 x) P{X 4 x') 



P(X 4x) + P(X yx)~ P(X 4 x') + P(X y x) 

PjX 4 xQ 
- P{X 4 x 1 ) + P{X y x>) Tx '~ 



□ 



APPENDIX B: SECTION 3 PROOFS 



Proof of Lemma 1. We can assume that X has a compact support 
to ensure that integrals are well defined (and standard approximation argu- 
ments yields the full result, or we are establishing probabilistic bounds and 
the compact set is chosen to control the probability). 

Since K is a convex set, the associated class of functions T is measur- 
able and dK has zero Lebesgue measure by Lemma 2.4.3 in Dudley [16]. 
Moreover, T is a VC class of sets with VC index at most 3d + 4. Therefore, 
condition E.2 holds with v(p) = (3d + 4)/p 2 . 

Let <7o denote the surface measure on dK. To establish E.5, let 

^:=sup/ f(x + y)da (y) < oo, 

x£R d Jd(-KUK) 

since the support of X is compact. Next, note that d(x,y) = \\x — y\\> 
E[\l{X E C(x)} - 1{X £ C(y)}\ 2 ]/fi. Then E.5 holds with (f> n (r) < (^ur + 
n 1 ^ 4 )v / l°§ n by Theorem 2.14.17 of van der Vaart and Wellner [57]. If Li 
is a singleton, we can improve the bound to 4> n { r ) ^ y/W" + using 
arguments in Kim and Pollard [31]. 

Since T is a VC class and K is a convex set which ensures enough mea- 
surability, E.6 holds by Theorem of 2.6.8 in van der Vaart and Wellner [57]. 

To establish E.3, building upon Section 5 in Kim and Pollard [31], note 
that 

Vt x = — / f(x + y)n^ K) (y)da (y) 

Px Jd(-K) 

~— f{x + y)n { _ KV j K) (y)da Q (y) 

Px Jd(-KUK) 

and 



VV= / Vf(x + y)n { _ KuK) (y)'da (y), 

Jd(-KUK) 
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where n^iy) is the outward pointing unit vector normal to dA at y. Letting 



Bx = d{-K U K) n d(-K), B 2 = d(-K UK)\ d{-K) C dK, we have 



Since —K is a convex cone with nonempty interior, the normal vectors 
cannot be (positively) linearly dependent. Therefore, we have Vr x 7^ for 
any x in the interior of the support of the random variable X. There- 
fore, Q(t) =r~ 1 (r) is a continuously differentiable hypersurface for every 
r G (0, 1) by the Global Implicit Function theorem. The smoothness of p x 
and Q(t) yields condition E.3 with a = 2 for all r G U. 

Also, p x = j_ KuK f(x + y) dy and r x = (l/p x -) J_ K f(x + y) (iy are twice 
differentiable functions. Therefore, p T is Lipschitz for r £U since W C (0, 1) 
is compact and p > under our conditions. Thus, condition E.4(i) is satisfied 
with 7 = 1. Moreover, continuity of p x and t x also implies that the mapping 
Q*(t) is upper-semi continuous. □ 

Proof of Lemma 2. The bound on E.2 follows from the union bound. 
Condition E.3 follows from the finite cardinality of S since for x G Q(t) \ 
Q* (r) we have p x < p T and for x G Q*(r) we have p x = p T . Take c = min Tg ^ p T — 
maXj.gg^^g. ( T ) p x > since is compact. Condition E.5 follows similarly 
to E.2, noting that for d(x,y) < 1 we have x = y. Condition E.6 follows triv- 
ially. Finally, E.4 follows by noting that p T and x T are piecewise constant 
mappings with a finite number of jumps. Thus, if IA does not include the 
indices corresponding to these jumps, E.4 holds trivially. □ 

Proof of Theorem 1. For convenience, let W x = {X ^ x}. Then, for 
all x G S such that p x > p we have, by condition E.2, 







(l{y G BI} + (1 - Tx )l{y G B x } + r x .l{y G -B 2 }) 



x f(x + y)n ( - K )(y) da (y). 



t x -t x \ = 



Vn(W x ) - P(W X ) 



lit 



\Px PxJ 
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< 



t (W x )- P(W X ) 



Px 



Px ~ Px 
Px 



<pV v (p)/ n - 



□ 



Lemma 8 (Technical lemma). Let < e\ V e 2 < €3 < 1/2 and f,g,h: 
[0, 1] ->■ [0, 1], such that for all t G [0, 1], 

limsup/(t fc ) < f(t) + ei, limsup5(t fc ) < g(t) + ei and 

(B.l) 

liminf/i(t fc ) > hit) - 

Moreover, assume that €2 < e3min fe [ 0) i] h(t), and for every t G [63, 1 — €3]: 

(i) \f(t)-th(t)\<e 2 , 

(ii) \g(t) - (1 - t)h(t)\ < e 2 and 

(iii) /(t) + (/(<)> /»(*)• 

Then, for every r G (3e3, 1 — 353) i/iere is ? such that f(t) > rh(t) — 2e\ and 
g{t)>(l-T)h(t)-2e l . 

Proof. Let t = sup ie [ e3 i_ es it :g(t) > (l — r)h(t). We have that g{2e 3 ) > 
(1 - 2e 3 )/t(2e 3 ) - e 2 = (1 - r)/i(2e 3 ) + (r - 2e 3 )/i(2e 3 ) - e 2 > (1 - r)/i(2e 3 ) by 
the assumption on e 2 and r. Similarly, g(l — 263) < 2e 3 h(l — 263) — £2 < 
(l-r)M2e 3 ). Sote[2e 3> l-2e 3 ]. 

Moreover, the condition (B.l) on g and h implies that g(t) > (1 — r)h(t) — 
2ei and, by the definition of i, g(t + //) < (1 — r)h(t + /i) for every /i > 0. 
Thus, f(i + /i) > r/t(t + /z) for every // > by (iii). In turn, condition (B.l) 
for / and h yields f(t) > rh{t) — 2e±, which establishes the result. □ 



PROOF of Theorem 2. The proof proceeds in steps. Step 1 establishes 
feasibility of a "near" partial quantile point. Step 2 derives the main argu- 
ments. Step 3 concludes the proof. 

Step 1. Feasibility of near partial quantile point. Note that for any point 
x that is feasible for (3.6) we have |r — rJ < e n /p x . Moreover, by Theorem 1, 
if also p x > p, we have \r x - t x \ < p y/v(p)/n, so that |r - t x \ < p u n := 
y/v(p)/n + e n /p. 

Assume that e n > Pick an arbitrary x T G Q*(t). By condition E.4, 
there is a continuous path of quantile points, V = {x T > :r' G (0,1)}, that 
passes through x T . Let ei = e n /2, e 2 = y/v(p)/n and e 3 = (1/6) min ueW u A 
(1 — u), so that f(t) = F n (X =4 %t)i g(t) = ^n(X !j= xt), and h(t) =p Xt satis- 
fies condition (B.l), (i) and (ii) by Theorem 1 and (iii) by definition. By 
Lemma 8, there exists x T * G V that is feasible for (3.6). Since p T * > p, 
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we have |r — r*| <p u n . On the other hand, if e n > e^' , x T G Q*(t) is it- 
self feasible with high probability. We can take x T * = x T and the relation 
|t — t*\ <pu n would still hold. 

Step 2. Main argument. We will derive the rate of convergence by bound- 
ing 

Pt St ~ Px T = Pt St ~Pt+ Px T ~ Px T 

from above using E.5 and the optimality of x T , and from below using the 
restricted identification condition E.2. 

To establish the upper bound first note that by optimality of x T , we have 
Px T * <Px T and using E.5, 

Px T -Px T <P <f>n(d(x T ,x T ))/yfn+p Xr -p$ T 
< P (f> n (d(x T ,x T ))/y/n + p Xr -px T ,. 

Applying E.5 one more time, and using that \t* — t\ <p u n so that d(x T *, 
x T ) <p u n and p Xr - p x ^ < p ul, 

Px T -Px T <P 4> n {d{x T ,X T ))/^/n + (pn(d(x T ,X T *))/^ + p XT ~Px T * 

<P <pn(d(x T ,x T ))/^/n + (j) n (u n )/^/n + ul. 

Also, since |r $T - r| < P u n , by E.4, p TSr - p T < P u%. 

Note that if d(x T ,x T ) <p vZ we are done. Therefore, the relations above 
yields 

Pr Sr -Px T <P 4>n{d{x T ,X T ))/^/n + ul. 

By E.3, we can minorate the left-hand side and obtain 

cA inf d(x T ,z) a < P (j) n (d(x T ,x T ))/^ + ul. 

Since the argument holds for all x T E Q*(t), we have 

cA inf d(x T ,z) a < P cj) n ( inf d(x T , x T ) ) j\fn + u^. 

z£Q*{t St ) \x t £Q*(t) / 

Next note that the minimum in the left-hand side cannot be c as n grows 
[since (f> n (d(x T , x T )) can be bounded by I^J v(p/2) = o{n 1 / 2 ) by Theorem 1]. 

Step 3. Conclusion of the proof. Using that a > 1 by E.3, E.4(ii), and 
the last relation in Step 2, 

inf d(x T ,x T ) 

x t €Q*(t) 

< inf d(x T , z) + d(z, x T ) 
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< inf d(x T ,z) + \t -t$ t \ 
zgQ*(t £t ) 

<P<^ /a ( inf d{x T ,x T ))/n l ' 2a + u^ a +u n 

< pUn V</ a V^/ a ( inf d{x T ,x T ))ln l ' 2a . 

The rate result follows as in [57]. □ 

Proof of Corollary 1. Since the order is complete, p x = p x = 1 for 
every x E S. In particular, condition E.5 is satisfied with 4> n {r) = 0, E.4 
with 7 = a and E.3 with any po sitive a since Q*(t) = Q(r). In this case 
e n :=e£Aef < ef < P □ 

Proof of Theorem 3. For convenience, let W x = {X =4 x} and C x = 
C{x). By E.6 we have y/E(P n (W x ) - P(W X )) - JV(0,P(W a! )(l - P(W X ))) 
and ^/n{p x -p x ) -w iV(0,p x (l - p x )). 

Moreover, we have 

^_ _ p n (w B ) p(ng _ PnCng p^w.) p n (ng - p{w x ) 

Px Pa; Pa; Px Px 



Px Px J Px 
Vn(Wx)Px-Px | F n (W x )-P(W x 



Px Px Px 



Pa; Px 

/ ^ ^ Px-Px Px-Px , Pn(W a ) ~ P(W X 

\J~x 1~x) T x + 



Pa; Pa; Pa; 



By Condition E.2, \ E£ ^ 2L \ <p \fv{p)/n = o P (l), so that 

(1 + Op(l)) Px (? x - T X ) = -T x (p x - Px ) + P„(W E ) - P(W X 



-^G n (l{W x }-T x l{C x }). 



Therefore, we have p x ^fn(j x — t x ) =p G n (l{W x } — t x 1{C x }). That converges 
to a zero mean normal distribution with variance 

E[(1{W X } - t x 1{C x }) 2 } = P(W X ) + t 2 Px - 2t x P(W x ) 

= p(ny (i - t x ) + t x (t xPx - p(w x )) 

= P(W X )(1-T X ) 
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using W x C C x and r x = P(W x )/p x . Finally, we get 
V^(r x -r x )^N(oJ x{1 ~ Tx) 



Px 

Note that within Cp, all the functions are bounded by 2/p with high prob- 
ability for large enough sample size. Therefore, a multidimensional central 
limit theorem applies and the covariance structure of a pair x,y G S is given 
by 



n 



x,y 



E 



(1{W X } - T X 1{C X }) (l{Wy} ~ Tyl{Cy}) 



Pa 



Py 



After simplification, we obtain 
P(W X n Wy 



^x,y 



P{C X )P(Cy) 



p(c x n w y ) P(c y nw x ) P(c x nCy 

TX P(C X )P(Cy) Ty P(C X )P(Cy) +TxTy 



( p(w x n w y ) P(C X n w y ) P(W X n c y 



XTy \P(W x )P(Wy) Px P{Wy) P(W X )P 



+ 



P(C X )P(Cy 

P(C x nc y ) 

PxPy 



Finally, asymptotic equicontinuity of (3 n (x) follows directly from the asymp- 
totic equicontinuity of a n (x) implied by E.6 and p > being fixed. □ 



Proof of Corollary 2. The proof of the second result builds upon 
arguments in [15, 18]. Based on Theorem 3, we have that for Cpi 2 = {x £ 
<S,p x > p/2}, the process f3 n (x) := y/n(T x — t x ) converges weakly in ^°°(C p /2) 
to a bounded, mean zero Gaussian process Gp. By the Skorohod-Dudley- 
Wichura representation theorem, there exists a probability space (fi,«4, P) 
carrying versions Gp and f3 n of Gp and f3 n such that sup xGC \/3 n (x) — 
Gp(x)\ — > as n grows. Next, note that for all r G U, x T £ C p / 2 provided 
that \J v(p)jn = o(p). Thus, 

Vn{ T x T ~t) = -Pn{x T ) + Vn(r XT ~r) = o(l) + G P {x T ) + y/n(r$ T - r). □ 

Proof of Theorem 4. Let r* and r* be such that p = p r * and p = p^*. 
Thus, we have x^* and x T * satisfying pf* =p X9t and p T * =p$ „. Moreover, 
let ii n := \Jv{p)/n + e n j p < n" 1 / 2 by assumption. 

First, note that since p <p T *, and >pj * j we have, by E.5, 

t* T 

P - p <Pt* - Pt* = h T * - Px T „ 

= Px T , Px T * - {Px T , ~Px T ,)+Px T * ~Px T , +Px T * -Px T , 
<p()>n(d(x T *,x T *))/y/n + p Ts ^ ~Px T , +Px T , ~Px T *- 
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Note also that by Step 1 in the proof of Theorem 2 we have |rg „ — r*\ <p u n . 
Moreover, p T is locally quadratic around r*. Therefore, 

P-P^P 4>n{d{x T * ,x T *))/^fn + u 2 n + p Xr , - p Xr , . 

Since it holds for any x T * E Q*(r*), 

p-p^,p(f>n( inf d(z T * ,x T *))/y/n 
(B.2) +u n + max {p x - p x } 

2-V*6Q*(r*) 

<P o(n~ 1/2 ) + max - p x } 

x r .6S*(r') 

since u 2 = o(n~ 1 / 2 ), and infa; *6Q*(t*) d(x r * , x T *) = op(l) by Theorem 2. 

Next, by Step 1 in the proof of Theorem 2, for every Xf* there is a partial 
quantile point Xf, d(x^*,x f ) <«„ that is feasible for (3.6) with r*. Thus, 
P x ~* >Px t - Using this inequality, E.5, and that pf* >p T * by definition (2.4), 

P~ P>Px f ~Px T , 

= PXr -Px f ~ {Px T « ~Px t *)+Pxt ~ Px T « + Px T * ~ Px T * 

(B.3) 

> P - (j)n(d(Xf,X T *))/^/n+Px 9 , -p Xr * +Px T * ~Px T , 
> -(j) n (d(Xr,X T *))/y/n+p x ^ ~Px T », 

where x f was chosen to be close to x T *, namely d(xf,x T *) < d(xf,x?*) + 
d(xf*,x T *) <p u n + \t* — t*\. Therefore, (B.3) holds for any x T * G Q*(r*) 
and d(x?*,x T *) < |r* — t* \ = op(l) by Lemma 9 below. Thus, 

(B.4) p- p> -o P (n~ l/2 ) + max {Px T *-Px T *}- 

Combining (B.4) and (B.2), we obtain y/n(p — p) = op(l) + Z p (t*). □ 

Lemma 9. Under the assumptions of Theorem 2, and that r h-> p T is a 
twice differentiate function, letp = p T * andp = p^*. Then \r* — r*| = op(l). 

Proof. Consider the twice differentiable function r^p T . Since p T * is 
its strict minimum at the interior of U , we have p T —p T * > | r — r * | 2 for r £ £Y . 

By Step 1 of the proof of Theorem 2, for every r we have that there 
is an that is feasible and |f — r| <p u n = op(l). Thus, 

Pt=Px t >Pxr ^PPf-V v{p)/n > P p T - y / v(p)/n -u% = o P (l)+p T . 
Similarly, since \r Xf - t\ <p u n , 

Pt <p Px T + ^/v{p/2)/n < P p Tx _ + v / v(p/2)/n 
< P p T + y / v(p/2)/n +ul = o P (l) + p T . 
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Therefore, using that pf* < p T * , 

\t~ T*\ 2 < p f » ~Pt* = Op(l) +V -p T * = Op(l). □ 

APPENDIX C: SECTION 4 PROOFS 

Proof of Lemma 4. This follows if support Ik = M d , where Ik is the 
Fourier transform of the indicator function of the set K, see [2], Proposi- 
tion 3.1. (We proceed as in Proposition 3.2 in [2] with the necessary modi- 
fications.) 

Step 1. Let ^ f G L x (R d ) such that support / C K, f(w) = J Rd e~ iw ' x x 
f(x) dx = J R e- iw ' x f{x) dx, and K° = {y G R d : y'x < for all x G K} denote 
the polar cone of K. Define the regions (in the complex space C d ) 

H = {z € C d :lm(z) £ K°} and H = {z G C d :lm(z) G intK°}. 

It follows from the definition that / can be extended to a bounded function g 
in the region H [because K is a proper convex cone, for any w G Hq and 
x G K, we have Re(— iw'x) < 0]. Moreover, g is analytic in Hq and continuous 
in H. Therefore, / is the restriction of the bounded analytic function g on 
the boundary of H [5]. Consequently, / cannot be identically zero on an open 

subset of M. d (which would imply that / = and, thus, / = 0), equivalently, 
support / = R d . 

Step 2. Next, we consider Ik which is a nonzero bounded Borel func- 
tion which is not in L 1 (M rf ). By contradiction, assume that Ik vanishes on 
a nonempty open set U of that is, (support 1^-) n U = 0. Let xq and 
e > such that B(x , 2e) C U. 

Let 7^ hi G L 1 (]R rf ) such that hi is a C°° function and support hi C 
B(0,e). Then 

support (hi ■ Ik) = support(/ii * Ik) Q support hi + support Ik C M d \i?(xo,e), 

where "*" denotes the convolution operator. 

On the other hand, hi - Ik G L 1 (R rf ) with support(/ii • Ik) C i^. Therefore, 
by Step 1, hi ■ Ik = almost everywhere on M. d . In turn, /ii is a C°°-function 
of compact support, so hi is the restriction of an entire function to R , 
and hence hi(x) ^ almost everywhere in M. d . Thus, Ik is zero almost 
everywhere which give us a contradiction since K is a proper convex cone. 
□ 

Proof of Lemma 5. Without loss of generality, we can consider only 
connected graphs (otherwise we proceed with each connected component 
separately). We provide an algorithm. 
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For each node, we have r x p x = P(X =4 x). If there is no incoming arc 
on x, we have that P{X =4 x) = P(X = x). For a general node x, if we 
already computed P(X = y) for all y 7^ x, y =4 %■> then we have P(X = x) = 
T x Px — Yly^x y^x P(X = y) . Otherwise, "backtrack" to consider a y 7^ x, y ^ 
x for which P(X = y) is not known. Since there are no cycles, we can only 
"backtrack" at most |5| < 00 before computing a probability for some y. 
Thus the procedure terminates in a finite number of steps with all proba- 
bilities. □ 

Proof of Theorem 7. The proof follows from the inequality of Lo- 
rentz [35] applied to each component individually. This follows the strategy 
of Chernozhukov, Fernandez- Val and Galichon [10] that previously used this 
inequality to prove a similar result. □ 



Proof of Corollary 3. If x T is partial- monotone, by Theorem 7 we 
have 



l^ix du 



1/k 



< 



< 2 



I *^ u ^ ^11 ' dtji 



1/k 


f 1 ~ 


1/k 


+ 


/ || %u II du 






Jo 





1/k 



I cc n x 1^ 1 1 du . 
The second follows by a triangular inequality. □ 



Proof of Theorem 8. Note that by independence and no point mass, 
we have P(X ^ x) = U* =1 (l - Fj(xj)), P(X 4 x) = \[ d j=1 Fj{ Xj ) and Px = 

P(X )? x) + P{X 4 x). Thus, x T e argmax{]J , } =1 (l-F j (x j )) + llj = iFj(.Xj) : 
r rij=i(l ~~ Fj(xj)) = (1 — T)YYj = iFj(xj)}. By the independence, we can 
write Oj = Fjixj) and recast the problem as maxa{llj=i a j + Ilj=i (1 ~~ 

aj) : (1 — r) YYj=i a j = (1 ~~ r ) rij=i(l ~~ a^- ) , < aj < 1}. By inspection, we 
have that < aj < 1, j = 1, . . . , d, at the optimal. By the optimality condi- 
tions, there is a A such that we have for every k = 1, . . . , d 

j^k j^k j^k j^k 

This implies that for every j = 1, . . . ,d, we have i_}x(1-t) = T^F ' • 

Therefore, a£ = a(r) for every k = 1, . . . , d. On the other hand, by fea- 
sibility we must have Il^iKA 1 - a j)\ = a(r) d /(l - a i T )) d = r/(l - r). 
Therefore, a(r)/(l — a(r)) = (r/(l — t)) 1 /^, which yields the result. □ 
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Proof of Theorem 9. By Proposition 1 with h{x) = (Fi(xi), F 2 (x2), ■ 
Fd( x d))i we have tx = T h(x) so that we can assume that X is a uniform (0, 1) 
random variable. Therefore, 



P(r x <r)=p(l[x j <T 

\j=i 



d d 
JJ Xj + JJ(l_ Xj ) 

J=l 3=1 



The first result follows by taking logs and noting that Zj := \og{xj/ (1 — Xj)) 
is distributed as a logistic random variable with zero mean and variance 
7r /3 when Xj IS cl uniform (0, 1) random variable. 

Next, since Zj is symmetric around zero, P(tx > 1/2) = P(^2j = iZj > 
0) = 1/2. Finally, let Z^ := d" 1 /' 2 Ylj=i z j and denote its probability den- 
sity function by fd- It follows that max z fd(z) = /d(0) < 1/2. Since Z^ d ' is 
symmetric, we have, for t £ (0, 1/2), 

P(\rx ~ 0.5| >t) = 2P{t x > 0.5 + t) = 2p(z& > d~ 1 ' 2 logf \ . 

Thus, using that log(l + x) < x and fd(z) < 1/2, 

P(\Tx-0.5\>t)>2PfzW>-^- I ) >l-2 jf / d (s)<fe 

> 1 . 

0.5 -t 

Using t := 0.5 — Cd" 1 / 2 in the expression above, 

Pfc - o,| > o, - orv., > , _ > , _ 1/c . n 

Proof of Lemma 6. We can compute the partial order and the prob- 
abilities P(X ^ x\C(x)) and P(X x|C(x)), which are bounded by 0(|<S|) 
for every fixed x G S. Varying over all choices of |<S|, we obtain 0(|5| 2 ) 
operations. □ 

Proof of Lemma 7. First, note that under C.2, we have that K is 
a convex cone with nonempty interior. Therefore, K has a strict recession 
direction, that is, 3w ^ such that K + w C int i'f. Moreover, if K n —K 
is full dimensional, K = W d and we have x )p y for every x,y £~R. d and the 
result holds trivially. Therefore, we can assume that K n —K is not full 
dimensional. 

Since K n —K is not full dimensional and X has no point mass, we have 
P(X !>= x fc= X) = for every x G S. Therefore p x = P(X fc= x) + P(X x). 
Moreover, p x , P(X =^ x) and P(X ^= x) are continuous in x. 
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Note that any pair (p,x) such that x £ Q(t) and p = p x is feasible for 
problem (4.6). By the log-concavity of the probability density function, 
P(X =<! x) = P(x — K) and P{X )p x) = P(x + K) are log-concave functions 
of x by the Prekopa-Leindler inequality (e.g., see [20]). This shows that (4.6) 
can be recast as a convex programming problem. 

Next, we will show that the solution to (4.6) also solves (2.3). If p* =p x *, 
then both constraints are active at the optimal point, and the result follows. 
Note that at least one constraint must be active at (p*,x*). 

Suppose p* < p x * , in which case x* ^ Q(t). Without loss of general- 
ity, assume that P(X ^ x*) > (1 — r)p* . Define the continuous functions 
u(t) = P(X x* + td) and i(t) = P(X =<; x* + td), which are, respectively, 
decreasing and increasing in t. For some t > 0, we have u(t) > (1 — r)p* and 
£(t) > rp* , which contradicts the optimality of (p*,x*). □ 

Proof of Theorem 10. From Lemma 7, it follows that we can recast 
the problem as the convex programming problem defined in (4.6). For p <p T , 
define the convex set 



where v = logp for p in (4.6). For an arbitrary e > 0, note that for every x 
we can approximate P(X )p= x) and P{X =3! x) up to a multiplicative factor 
of 1 + e using the integration procedure for log-concave distributions based 
on random walks proposed by Lovasz and Vempala [36]. Relying on these 
results, we can construct and eo-approximate a membership oracle whose 
complexity is given by 



where £q = p T £. Note that by controlling the error in the computation of 
P{X =3 x) and P(X )p x) by a factor of 1 + e, we control the error in the 
computation of r x by an additive error of e. 

Based on this membership oracle, we can apply the results in [36] for 
optimization, which requires 0*(d 4 ' 5 ) calls of the constructed membership 
oracle. □ 
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H(p) := {(v, x) E R x S : logP(A x) > log(l -t)+v, 
log P(X =^ x) > log r + v, logp < v < 0} 
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